CRIU - an ambitious new project to preserve and restore the state of processes

CRIU (application Checkpoint / Restore In Userspace) is an ambitious, fast-paced project that allows you to save the program state as a control point, and subsequently restart the application from this point.
The possibilities of using software for creating control points are quite varied. For example, OpenVZ uses a similar mechanism for live migration. Parallels Virtuozzo uses a similar mechanism to quickly resume containers after a kernel upgrade. CRIU is already used in high-performance clusters to save intermediate results of computational processes used to resume application operation in the event of a failure.
This article describes how CRIU saves and restores the state of the program, and why this project can be more successful than its predecessors.

A bit of history

CRIU is not the first attempt to implement a mechanism for saving and restoring programs in Linux. There are at least two working C / R implementations: OpenVZ and the linux-cr project, led by Oren Laadan.
The problem with both projects is that they implement the entire C / R almost completely in kernel space. However, these projects did not become part of the main Linux kernel due to the large size and complex code.
The leader of the OpenVZ development team, Pavel Emelyanov, proposed to change the approach to C / R itself and to transfer the main work to user space. The community accepted this idea well and the CRIU project appeared.

Saving Application State

An application can consist of both one and many running processes. CRIU supports both types.
For information about the state of the process, the first thing that comes to mind is the use of the mechanism that the debugger uses (ptrace). But he does not provide all the information. Part of the process state can be gleaned from the procfs file system and the prctl system call, but this is not enough.
To obtain the missing information in the CRIU project, the mechanism for implementing the executable code into the process (the so-called parasitic code) was used, which was developed and kindly provided by Tejun Heo, one of the main developers of the Linux kernel today.
')

Restore application state

To restore an application from a checkpoint, you must first create a tree of its processes. To do this, a separate mechanism was developed in the Linux kernel for creating processes with specified identifiers. After the start, each process recovers its memory, open files, sockets, pipes, IPC, etc.
However, in fact, everything is not so simple, because resources may be common to several processes.
Regions of the address space of a process can be unique (MAP_PRIVATE) or accessible simultaneously by several processes (MAP_SHARED). The first type is restored quite simply (if you do not think about the technology “copy on write” - it remains outside the brackets of this article).
What is the problem in the second version? If we have the region of the address space mapped to a file, then everything is fine - to restore such a piece of address space, you can use standard Linux mechanisms.
But what to do if the region is anonymous (MAP_ANONYMOUS), i.e. displayed in RAM? To support the recovery of such a region, we had to flatten the kernel again with a file, namely, to map each in the procfs file system — the kernel creates a file in / proc / self / map_files / <start_addr> - <end_addr> for each distributed memory region. Thanks to these files, anonymous regions of distributed memory can be recovered as file.
The next problem on the queue was open files. They can also be shared by several processes. Here the ability to transfer file descriptors via Unix sockets came in handy. One of the user processes opens the file and forwards it to the others.
So, the memory is restored, the files are open. How is process control transferred? The transfer of control is based on the sigreturn () system call: when a signal arrives, the kernel saves the state of the process and transfers control to the signal handler, which at the end calls sigreturn (). Thus, in order to start the process from the desired point, it is enough to restore the state of the process in the required format and call sigreturn ().
And the process is restored and continues to work.

Interesting

CRIU can save and restore TCP connections. It can be used for live migration. The user will notice only a slight delay in network activity when the application is moved with all its connections to another server.
CRIU saves process data to disk using Google's Protocol Buffers format. Software libraries for working with this protocol exist for most popular languages.
During development, all CRIU capabilities are constantly monitored by an embedded test system. To do this, use the revised and improved framework, borrowed from projects to develop OpenVZ and Virtuozzo.
CRIU is being developed by Parallels as part of a project to port OpenVZ code to the main Linux kernel.

Goals

The main goal of the project for the near future is to learn how to save and restore the state of Linux Containers (LXC). To do this, save and restore terminals, received but not yet processed signals, network environment namespaces (network namespace) and file hierarchy (mount namespace; under development).

Links

ru.wikipedia.org/wiki/CRIU
ckpt.wiki.kernel.org/index.php/Main_Page
wiki.openvz.org/Checkpointing_and_live_migration
www.parallels.com
code.google.com/p/protobuf

Source: https://habr.com/ru/post/148413/

All Articles