When reboot time matters or why IBM uses CRIU on mainframes

In the modern world, when a bright future is tipped to microservices, it seems strange to engage in technologies that help to update the code without rebooting. After all, microservices and containers are much easier to "kill" and re-create. Nevertheless, we continue to work on the CRIU live migration system, and IBMs are actively helping us in this. Why? Let's try to explain.

In the wake of global virtualization, the convergence and success of container architectures, the patching begins to seem rudimentary. Why do you need to install updates and reboot when you can take and create a container again? And this is true for those cases when it comes to user applications and services, development and testing. But as practice shows, the infrastructure on which it all rotates requires a completely different approach. The stability and constant availability of heavy services, such as databases, allow microservices to run at any time and use any data.

It is obvious to everyone that systems that start up for a long time and warm up should not restart too often, and best of all, so that they never restart at all. And the more powerful the system is, and the more microservices it depends on, the less profitable it is to stop its operation in order to reboot. One example of solving this problem is the ReadyKernel technology, which allows you to install updates to the Linux host OS, which runs many virtual machines and containers, without reloading it. Another solution to reduce the downtime of various services is offered by our CRIU project.
')

CRIU becomes standard

Despite the doubts that met the CRIU at the stage of the development of this OpenSource tool (however, Gates was the first to speak with the tablet, they also laughed at), today the CRIU is integrated into OpenVZ, Docker, LXC, CoreOS containers; Included in Linux distributions Ubuntu, Debian, OpenSUSE, Altlinux and several others, and also supported by developers from various companies, including IBM. By the way, it is curious that it was Blue Giant that made one of the largest contributions to CRIU - today the tool works on several platforms at once: x86_64, ARM, aarch64, PPC64 and s390. And two of them - PowerPC64 and s390 - are the brainchild of IBM. The tool support for the latter was announced literally in the summer of 2017.

In order to explain why the largest company in the field of developing hardware platforms and software requires such tools, you need to get a little insight into the essence of the project itself. CRIU allows you to "freeze" the application to run it later on another host or in another container. With the right application of this tool, the application should not even guess that it was moved, continuing to work, as if nothing had happened. As already mentioned, microservices do not need this at all, but it turns out to be very useful for those tasks that are performed, including on mainframes.

The microprocessor architecture for high-performance s390 servers is unique, IBM is developing it in its mainframe lineup. Multiprocessor and multi-threaded systems allow you to work with huge amounts of data, which imposes its own characteristics on the architecture of the OS and applications. In the summer of 2017, patches from IBM developers came to CRIU, which make it possible to use CRIU on s390. The fact is that CRIU is a low-level tool, its code is close to the kernel code, and therefore its adaptation to each new architecture is required. In order for CRIU to work, it was necessary to implement support for platform-specific functions. From simple to complex, IBM developers provided support for system calls, proprietary data types, descriptors of the virtual address space of processes, added the necessary compiler settings, images for registers, the necessary jumps for parasitic code that we embed in the process for its “freezing”, as well as architectural TLS / GOT type specificity. You can get acquainted with the content of the work done here :

#include "common/asm/linkage.h"
.section .head.text, "ax"
/*
* Entry point for parasite_service()
*
* Addresses of symbols are exported in auto-generated criu/pie/parasite-blob.h
*
* Function is called via parasite_run(). The command for parasite_service()
* is stored in global variable __export_parasite_cmd.
*
* Load parameters for parasite_service(unsigned int cmd, void *args):
*
* - Parameter 1 (cmd) : %r2 = *(uint32 *)(__export_parasite_cmd + pc)
* - Parameter 2 (args): %r3 = __export_parasite_args + pc
*/
ENTRY(__export_parasite_head_start)
larl %r14,__export_parasite_cmd
llgf %r2,0(%r14)
larl %r3,__export_parasite_args
brasl %r14,parasite_service
.long 0x00010001 /* S390_BREAKPOINT_U16: Generates SIGTRAP */
__export_parasite_cmd:
.long 0
END(__export_parasite_head_start)

Springboard for implementation on s390 parasitic code

Users of IBM platforms are faced with a sufficiently large amount of software, which is too heavy to manipulate a la "kill, re-create." In massive applications, it is much more convenient to save the state of services, for example, in order to recover work when power is lost. The ability to migrate containers in a “live” state allows you to free servers for maintenance or load balancing, and so on.

And this concerns not only mainframes. IBM's participation in the CRIU project is not limited to the s390 architecture. A few years ago, we received patches from IBM to support PPC64. These IBM solutions are designed for workstations and servers "simpler" - not mainframes. But the most interesting contribution that IBM developers have made to the CRIU project is the “lazy migration” technology.

This happens as follows: the container is moved from one host to another without the contents of its memory. This approach allows to reduce the size of images by an order of magnitude, and it is very effective for those applications that keep huge amounts of data in memory. For example, if we are talking about JVM, its full image can take dozens of megabytes (and this is without taking into account the memory that the program working in it allocates to itself), while its size without memory content will be several tens of kilobytes. Due to this, migration occurs several times faster, reducing the pause in work. The essence of what supplements from IBM do is the ability to provide remote access to memory and its asynchronous migration, if necessary.

Nevertheless, there are a lot of tasks when the system needs to be rebooted. And here the ability to stop the application is also useful. CRIU allows you to stop the container, reboot the system, and start the container in it again. Thus, we solve the problem of patching for difficult situations when it is impossible to update the system without rebooting.

Conclusion

The extended support of the CRIU project allows us to say that today every developer can use the “freeze” and “live migration” applications on 5 different architectures. IBM's contribution to the development of the project allowed not only to use the CRIU capabilities on the mainframe and PPC64 servers, but also to use the “lazy migration” mechanisms on other platforms.

Moreover, the changes that took place led us to the creation of a separate Compel library, which allows us to infect processes with parasitic code, forcing us to follow certain instructions. Today, Compel is used in the CRIU project, as well as in the new system of live patch application. We will tell about it and the Compel library itself in the next post.

Source: https://habr.com/ru/post/339286/

All Articles

When reboot time matters or why IBM uses CRIU on mainframes

CRIU becomes standard

Conclusion

More articles: