Listen up
After all, if the stars are lit - it means - is it necessary for someone?
V.V. Mayakovsky, 1914

I do programming for embedded systems, and decided to write this article in order to better understand the problem of using the
fork () and
vfork () system calls. The second of them is often advised not to use, but it is clear that he appeared for a reason.
Let's see when and why it is better to use this or that challenge.
')
As a bonus, the description of
vfork () /
fork () implementations in
our project will be given. First of all, my interest is connected with the use of these calls in embedded systems, and the main feature of these implementations is the absence of virtual memory. Perhaps habrovchane well versed in system programming and embedded systems, will give advice and share experiences.
Who cares, I ask under the cat.
Let's start with the definition, that is, the POSIX standard, in which these functions are defined:
fork () creates an exact copy of the process with the exception of a few variables. If successful, the function returns zero to the child process and the number of the child process to the parent (the processes then begin to “live their life”).
vfork () is defined as
fork () with the following restriction: the behavior of the function is not defined if the process created with its help performs at least one of the following actions:
- Will return from the function in which vfork () was called;
- Call any function except _exit () or exec * () ;
- Changes any data except the variable that stores the value returned by the vfork () function.
In order to understand why there is a system call at all with such strong limitations, you need to figure out what an exact copy of the process is.
One of the first links in the search engine for this topic in Russian is the description of the cloning parameters of processes in Linux. It follows that some parameters can be made common for the parent and child processes:
- Address space (CLONE_VM);
- File system information (CLONE_FS);
- Open file table (CLONE_FILES);
- Table of signal handlers (CLONE_SIGHAND);
- Parent process (CLONE_PARENT).
In POSIX for
vfork (), it is not allowed to change variables, and this suggests that it is a matter of cloning the address space.
This link confirms the assumption:
Unlike fork () , vfork () does not create a copy of the parent process, but creates an address space that is shared with the parent process until the _exit function or one of the exec functions is called.
The parent process at this time stops its execution. All the restrictions on use follow from this — the child process cannot change any global variables or even common variables that are shared with the parent process.
In other words, if this statement is true, after calling
vfork () both processes will see the same data.
Let's do an experiment. If this is true, then the changes made to the data of the child process must be visible in the parent process, and vice versa.
Code testing assumption.static int create_process(void) { pid_t pid; int status; int common_variable; common_variable = 0; pid = fork(); if (-1 == pid) { return errno; } if (pid == 0) { common_variable = 1; exit(EXIT_SUCCESS); } waitpid(pid, &status, 0); if (common_variable) { puts("vfork(): common variable has been changed."); } else { puts("fork(): common variable hasn't been changed."); } return EXIT_SUCCESS; } int main(void) { return create_process(); }
If you build and run this program, we get the output:
fork (): common variable hasn't been changed.When replacing
fork () with
vfork () , the output will change:
vfork (): common variable has changed.Many use this property when transferring data between processes, although the behavior of such programs is not defined by POSIX. This is likely to create problems that make it advised not to use
vfork () .
Indeed, one thing is when a developer consciously changes the value of some variable, and quite another when he forgets that the child process cannot, for example, return from the function in which
vfork () was called (because it will destroy the stack structure of the parent process !) And even acting consciously, as usual, you use undocumented features at your own risk.
But a couple of less obvious problems:
- The book “Secure Programming for Linux and Unix HOWTO” says that even if a child does not really change any data in a high-level language code, it may not be so in computer code (for example, due to the appearance of hidden temporary variables).
- This blog analyzes the following question: what if vfork () is called in a multithreaded application? Consider the vfork () implementation in Linux: the manual says that the parent process stops when it is called, but in fact it only happens with the current thread (which, of course, is easier to implement). This means that the child process continues to run in parallel with other threads that may, for example, change the rights of the parent process. And here everything will become very bad: we will get two processes with different rights in the same address space, which opens a security hole.
Now consider the functions of
the exec * family . Only they (apart from
_exit () ) can be called in the process obtained with
vfork () . They create a new address space, and then load the code and data from the specified file into it. In this case, the old address space, in fact, is destroyed.
Therefore, if the process is created using
fork () and then calls
exec * () , creating (copying) the address space when calling
fork () was redundant, and this is quite a laborious operation, and it may take the most time to call
fork. () In Wikipedia, for example, this moment is given the most attention, and, unlike the standard,
it is directly stated :
Space for the child. Copy of the parent process.
Of course, on most modern systems with virtual memory, no copying takes place; all pages in the memory of the parent process are simply flagged with
copy-on-write . However, at the same time you need to go over the entire hierarchy of tables, and this takes time.
It turns out that the
vfork () call should execute faster than
fork () , which is also mentioned in the
LinuxMan page .
We will conduct another experiment and make sure that this is true. Let's slightly change the previous example: add a loop to create 1000 processes, remove the common variable and display it on the screen.
The resulting code. #include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <unistd.h> #include <sys/wait.h> static int create_process(void) { pid_t pid; int status; pid = vfork(); if (-1 == pid) { return errno; } if (pid == 0) { /* child */ exit(EXIT_SUCCESS); } waitpid(pid, &status, 0); return EXIT_SUCCESS; } int main(void) { int i; for (i = 0; i < 1000; i ++) { create_process(); } return EXIT_SUCCESS; }
Run through the time command.
Output when using fork () | Output when using vfork () |
real 0m0.135s user 0m0.000s sys 0m0.052s | real 0m0.028s user 0m0.000s sys 0m0.016s |
The result, to put it mildly, is impressive. From launch to launch, the data will differ slightly, but still
vfork () will be 4 to 5 times faster.
The conclusions are as follows:
fork () is a more “heavy” call, and if
vfork () can be called, it is better to use it.
vfork () is a less secure call, and it’s easier to shoot yourself in the foot, and, accordingly, it should be applied intelligently.
fork () /
vfork () should be used where you need to create separate resources for the process (inodes, user, working folder), otherwise it is worth working with pthread *, which work even faster.
It is better to use
fork () when you really need to create a separate address space. However, it is very difficult to implement on small processor platforms without hardware support for virtual memory.
Before turning to the second part of the article, I note that in POSIX there is a function
posix_spawn () . This function, in fact, contains
vfork () and
exec () , and therefore avoids the problems associated with
vfork () , in the absence of re-creating the address space as in
fork () .
Now let's move on to our
fork () /
vfork () implementation without MMU support.
Vfork implementation
Implementing
vfork () in our system, we assumed that the
vfork () call should occur like this: the parent goes into standby mode, and the first of
vfork () returns the child process, waking up the parent when calling the
_exit () or
exec * () function. This means that a descendant can be executed on the parent stack, but with its own resources of other types: inodes, signal table, and so on.
The storage of various types of resources in our project is a task (
struct task ). It is this structure that describes all the resources of the process, including the available memory, inodes, and the list of threads that belong to this process. A task always has a main thread - the one that is created when it is initialized. The flow in our system is called the object of planning, more about this -
in the article of my colleague . Since the stack is controlled by a thread, not a task, we can offer two options for implementation:
- Change the stack in the newly created thread to the parent's stack;
- “Replace” a task with a new one for the same execution thread
One way or another, the task will have to be created, or rather, to inherit it from the parent one: a clone of the signal table, environment variables, and so on will be made. Address space, however, will not be inherited.
Return from
vfork () will be performed twice: for the parent and child processes. It means that the registers of the stack frame from which
vfork () was called should be saved somewhere. You cannot do this on the stack, since the child process can wipe these values at runtime. However, the
vfork () signature does not imply the presence of some kind of buffer, so first the registers are stored on the stack, and only then - somewhere in the parent task. Saving registers on the stack could be done using a system call, but we decided to do without it and did it ourselves. Naturally, the
vfork () function is written in assembler.
Code for i386 architecture. vfork: subl $28, %esp; pushl %ds; pushl %es; pushl %fs; pushl %gs; pushl %eax; pushl %ebp; pushl %edi; pushl %esi; pushl %edx; pushl %ecx; pushl %ebx; movl PT_END(%esp), %ecx; movl %ecx, PT_EIP(%esp); pushf; popl PT_EFLAGS(%esp); movl %esp, %eax; addl $PT_END+4, %eax; movl %eax, PT_ESP(%esp); push %esp; call vfork_body
Thus, the registers are first stored on the stack, and then the C-
shny function
vfork_body () is called. As an argument, it is passed a pointer to a structure with a set of registers.
The mentioned structure for i386. typedef struct pt_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t gs; uint32_t fs; uint32_t es; uint32_t ds; uint32_t trapno; uint32_t err; uint32_t eip; uint32_t cs; uint32_t eflags; uint32_t esp; uint32_t ss; } pt_regs_t;
The
vfork_body () code is architecturally independent. He is responsible for creating the task and saving the registers needed to exit.
Function code vfork_body (). void __attribute__((noreturn)) vfork_body(struct pt_regs *ptregs) { struct task *child; pid_t child_pid; struct task_vfork *task_vfork; int res; assert(thread_self() == task_self()->tsk_main); child_pid = task_prepare(""); if (0 > child_pid) { ptregs_retcode_err_jmp(ptregs, -1, child_pid); panic("vfork_body returning"); } child = task_table_get(child_pid); task_vfork = task_resource_vfork(child->parent); memcpy(&task_vfork->ptregs, ptregs, sizeof(task_vfork->ptregs)); res = vfork_child_start(child); if (res < 0) { vfork_child_done(child, vfork_body_exit_stub, &res); ptregs_retcode_err_jmp(&task_vfork->ptregs, -1, -res); } panic("vfork_body returning"); }
A few explanations to the code.
First, a multithreading check occurs (problems associated with it when using
vfork () , discussed above). Then a new task is created, and if it succeeds, the registers are saved in it to return from
vfork () .
After that, the
vfork_child_start () function is
called , which, as the name implies, “starts” the child process. The quotes here are not random, as in fact the task can be launched later, it all depends on the specific implementation, of which there are two in our project. Before proceeding to their description, consider the functions
_exit () and
exec * () .
When they are called, the parent thread must be unblocked. We will say that at this very moment a full-fledged launch of the child process occurs as a separate entity in the system.
Execv function code int execv(const char *path, char *const argv[]) { struct task *task; task = task_self(); task_resource_exec(task, path, argv); vfork_child_done(task, task_exec_callback, NULL); return 0; }
Other functions of the
exec * family are expressed through the
execv () call.
Function code _exit () void _exit(int status) { struct task *task; task = task_self(); vfork_child_done(task, task_exit_callback, (void *)status); task_start_exit(); { task_do_exit(task, TASKST_EXITED_MASK | (status & TASKST_EXITST_MASK)); kill(task_get_id(task_get_parent(task)), SIGCHLD); } task_finish_exit(); panic("Returning from _exit"); }
As can be seen from the above code, in order to unlock the parent process, the
vfork_child_done () function is used with the handler as one of the parameters. To implement a particular algorithm, the work must be implemented:
- vfork_child_start () - the function called at the beginning of the cloning process should block the parent process;
- vfork_child_done () is a function that is called upon the final start of the child process, the parent process is unlocked;
- task_exit_callback () - a handler for completing a child process;
- task_exec_callback () - a handler for the full launch of the child process.
First implementation
The idea of the first implementation is to use the same control flow besides the same stack. In fact, in this case, you only need to “replace” the task for the current thread with the child one until the child task finally starts when you call
vfork_child_done () .
Function code vfork_child_start () int vfork_child_start(struct task *child) { thread_set_task(thread_self(), child); task_vfork_start(child); ptregs_retcode_jmp(&task_resource_vfork(child->parent)->ptregs, 0); panic("vfork_child_start returning"); return -1; }
The following happens: the current execution thread (that is, the parent thread) is bound to the child process by the
thread_set_task () function — for this, it suffices to change the corresponding pointer in the structure of the current thread. This means that when accessing resources associated with a task, the thread will refer to the child's task, and not to the parent, as before. For example, when a thread tries to find out which task a thread belongs to (the
task_self () function), it will receive a child task.
After this, the child task is marked as created as a result of
vfork , this flag will be needed in order for the
vfork_child_done () function to be executed as needed (more details - a little later).
Then the registers saved by the
vfork () call are restored. Recall that according to POSIX, the
vfork () call should return a value of zero to the child process, which is done by calling
ptregs_retcode_jmp (ptregs, 0) .
As already mentioned, when the child process
calls the _exit () or
execv () function, the
vfork_chlid_done () function must unblock the parent stream. In addition, you need to prepare a child task for executing the required handler.
Function code vfork_child_done () void vfork_child_done(struct task *child, void * (*run)(void *), void *arg) { struct task_vfork *vfork_data; if (!task_is_vforking(child)) { return; } task_vfork_end(child); task_start(child, run, NULL); thread_set_task(thread_self(), child->parent); vfork_data = task_resource_vfork(child->parent); ptregs_retcode_jmp(&vfork_data->ptregs, child->tsk_id); }
Handler code task_exit_callback () void *task_exit_callback(void *arg) { _exit((int)arg); return arg; }
Handler code for task_exec_callback () void *task_exec_callback(void *arg) { int res; res = exec_call(); return (void*)res; }
When calling
vfork_child_done (), you must consider the case of using
exec () /
_exit () without
vfork () - then you just need to exit the current function, because there is no need to unlock the parent, and you can immediately proceed to launch the child. If the process was created using
vfork () , the following is done: first, the
is_vforking flag is
removed from the child task using
task_vfork_end () , then, finally, the main thread of the child task starts. The entry point is the
run function, which should be one of the handlers described earlier (
task_exec_callback ,
task_exit_callback ) - they are necessary when implementing
vfork () . After that, the thread’s belonging to the task changes: instead of the child, the parent is specified again. Finally, it returns to the parent task from the
vfork () call with the child process ID as the return value. As mentioned above, this is done by calling
ptregs_retcode_jmp () .
The second implementation of vfork
The idea of the second implementation is to use the parent stack with a new thread that was created with the new task. This is obtained automatically if the registers previously saved in the parent stream are restored in the child stream. In this case, you can use this synchronization between threads, as described in the
already mentioned article . This is certainly more beautiful, but also more difficult to implement, because when the parent thread is waiting, its descendant will be executed on the same stack. This means that for the waiting time, you need to switch to some intermediate stack, where you can safely wait for the call of a child of
_exit () or
exec * () .
The vfork_child_start function code for the second implementation int vfork_child_start(struct task *child) { struct task_vfork *task_vfork; task_vfork = task_resource_vfork(task_self()); task_vfork->stack = sysmalloc(sizeof(task_vfork->stack)); if (!task_vfork->stack) { return -EAGAIN; } task_vfork->child_pid = child->tsk_id; if (!setjmp(task_vfork->env)) { CONTEXT_JMP_NEW_STACK(vfork_waiting, task_vfork->stack + sizeof(task_vfork->stack)); } task_vfork = task_resource_vfork(task_self()); sysfree(task_vfork->stack); ptregs_retcode_jmp(&task_vfork->ptregs, task_vfork->child_pid); panic("vfork_child_start returning"); return -1; }
Explanation of the code:
First, space is allocated for the stack, after that the
pid (process ID) of the child is saved, since it will be required by the parent to return from
vfork () .
Calling
setjmp () will return to the stack location where
vfork () was called. As already mentioned, the wait must be performed on some intermediate stack, and the switch is performed using the
CONTEXT_JMP_NEW_STACK () macro, which changes the current stack and transfers control to the
vfork_waiting () function. It will activate the child and block the ancestor before calling
vfork_child_done () .
Vfork_waiting code static void vfork_waiting(void) { struct sigaction ochildsa; struct task *child; struct task *parent; struct task_vfork *task_vfork; parent = task_self(); task_vfork = task_resource_vfork(parent); child = task_table_get(task_vfork->child_pid); vfork_wait_signal_store(&ochildsa); { task_vfork_start(parent); task_start(child, vfork_child_task, &task_vfork->ptregs); while (SCHED_WAIT(!task_is_vforking(parent))); } vfork_wait_signal_restore(&ochildsa); longjmp(task_vfork->env, 1); panic("vfork_waiting returning"); }
As can be seen from the code, first of all the table of signals of the child process is saved. In fact, the
SIGCHLD signal will be redefined, which is sent when the status of the child process changes. In this case, it is used to unlock the parent.
New SIGCHLD handler static void vfork_parent_signal_handler(int sig, siginfo_t *siginfo, void *context) { task_vfork_end(task_self()); }
Saving and restoring the signal table is done using the POSIX call
sigaction () .
Saving the handler static void vfork_wait_signal_store(struct sigaction *ochildsa) { struct sigaction sa; sa.sa_flags = SA_SIGINFO; sa.sa_sigaction = vfork_parent_signal_handler; sigemptyset(&sa.sa_mask); sigaction(SIGCHLD, &sa, ochildsa); }
Recovery handler static void vfork_wait_signal_restore(const struct sigaction *ochildsa) { sigaction(SIGCHLD, ochildsa, NULL); }
After replacing the signal handler, the task is marked as being in the standby mode, in which it will remain until the present launch of the child task when calling
_exit () /
exec * () . The
vfork_child_task () function is used as the entry point to the task, which restores previously saved registers and returns from
vfork () .
Function code vfork_child_task () static void *vfork_child_task(void *arg) { struct pt_regs *ptregs = arg; ptregs_retcode_jmp(ptregs, 0); panic("vfork_child_task returning"); }
When calling
_exit () and
exec * (), SIGCHLD will be sent, and the handler of this signal will uncheck waiting for the child to run. After that, the old
SIGCHLD signal handler will be restored, and the control will be returned to the
vfork_child_start () function using
longjmp () . It must be remembered that the stack frame of this function will be damaged after the execution of the child process, therefore local variables will contain not what is needed. After releasing the previously allocated stack, the
vfork () function returns the number of the child task.
Vfork health check
To check the correct behavior of
vfork (), we wrote a set of tests covering several situations.
Two of them check for a valid return from
vfork () when calling
_exit () and
execv () as a child process.
First test TEST_CASE("after called vfork() child call exit()") { pid_t pid; pid_t parent_pid; int res; parent_pid = getpid(); pid = vfork(); test_assert(pid != -1); if (pid == 0) { _exit(0); } wait(&res); test_assert_not_equal(pid, parent_pid); test_assert_equal(getpid(), parent_pid); }
Second test TEST_CASE("after called vfork() child call execv()") { pid_t pid; pid_t parent_pid; int res; parent_pid = getpid(); pid = vfork(); test_assert(pid != -1); if (pid == 0) { close(0); close(1); close(2); if (execv("help", NULL) == -1) { test_assert(0); } } wait(&res); test_assert_not_equal(pid, parent_pid); test_assert_equal(getpid(), parent_pid); }
Another test checks the use of the same stack by the parent and child processes.
Third test TEST_CASE("parent should see stack modifications made from child") { pid_t pid; int res; int data; data = 1; pid = vfork(); test_assert(pid != -1); if (pid == 0) { data = 2; _exit(0); } wait(&res); test_assert_equal(data, 2); }
However, I would like to check the correctness of the work on some real, and third-party, program, and for this, a fairly well-known
dropbear package was
chosen . When configured, it checks for
fork () , and if it does not find it, it can use
vfork () . Immediately I would like to say that this was done in order to support
ucLinux , and not in order to improve performance.
The OS was configured accordingly (for dropbear to use vfork () ), and using ssh to successfully connect to both implementations.PS Also in our project we managed to implement fork () itself without using the MMU, at the moment an article is being compiled about it.