📜 ⬆️ ⬇️

Parallel execution of dependent tasks and synchronization with conditional variables in the shell

How to synchronize parallel shell processes using named pipes (FIFO files) as condition variables. How to perform parallel dependent tasks in topological order with a minimum of tools: POSIX shell, mkfifo, POSIX kernel. How parallel launch speeds up the loading of embedded systems and * BSD (rc-stage FreeBSD from 27 to 7 seconds) or the launch of applications in Docker, LXC and jail user containers. How it improves uptime in Jet9 failover clusters.

1 Why do you need tasks with dependencies and what gives them a parallel launch
2 Parallel vs sequential
3 Performing tasks with dependencies
3.1 rcorder, topological sorting and sequential execution
3.2 upstart, events and asynchronous execution
3.3 systemd, dependencies and asynchronous execution
3.4 runit, parallel execution and synchronization when performing a single task
4 Supervisors or Planners
4.1 multitask-flow, parallel execution of dependent tasks
5 Means for synchronization of parallel shell processes
5.1 Lock file as mutex
5.2 FIFO file as a condition variable
5.3.1 Constructs for a condition variable in a shell
5.3.2 Requirements for the working environment
6 Device multitask-flow
6.1 Parallel execution and synchronization on FIFO files
6.2 How can and how can not apply multitask-flow
7 Related Links

Why do you need tasks with dependencies and what gives them a parallel launch


Frequently encountered tasks with pronounced dependencies are loading or changing the server operation mode and starting / stopping user containers. Most of our Linux and FreeBSD hardware is not loaded into the standard installation of the distribution, but into our own operating environment. This is a compact image located on a RAM disk, in which there are system programs required for the main profile of work, and various auxiliary utilities. Starting a service, initializing a device, applying settings are separate tasks that are present or absent in the system depending on the purpose of the server, and these tasks must be performed in the correct sequence. For example, if the network configuration starts running before the network card driver is loaded, the network will not work. Different applications work in user containers, the correctness of the work of some of them may depend on the work of others, which is why the order in which they start and stop is important. For example, if the application server is started before the database server, it may immediately stop due to the inability to connect to the database.
')
To start the system now there are two generally accepted approaches. The first is to specify the order in which tasks are performed explicitly. To do this, you can either list the tasks in the execution order in the shell script, or number the tasks and use the number in the file name of the task, and then execute them by sorting the file name:

S23ntp S25mdadm S91apache2 


The second way to start is to specify the dependencies between the start scripts, that is, to list what conditions are needed to complete the task and what conditions the task creates after its completion:

 # REQUIRE: apache # PROVIDE: nginx 


Although specifying dependencies is more complicated than numbering, it is much more convenient and is now used almost everywhere.

The main drawback that has explicit numbering of tasks is that the developer first needs to identify dependencies on other tasks, then find the numbers of these dependencies, and select the appropriate number for the new task, in the process which may reveal that the numbering was unsuccessful and needs to be changed for some of the scripts . When performing according to dependencies, the main advantage is the simplicity of adding tasks to an already established process — for a new task, it is enough just to list its dependencies on other tasks. Another convenience is the possibility of parallel execution of tasks, because only dependent tasks should be performed in turn, and tasks that are not dependent on each other can be performed in parallel.

Parallel vs serial


Parallel execution when performing tasks has two key advantages:



With regard to the latter, some people are probably familiar with the situation when sendmail does not start up for a long time due to network problems and waits for DNS responses for tens of minutes, as a result of which it’s impossible to log in to the server, since sshd was in the starting sequence later than sendmail . With a parallel start, if sshd does not specify a dependency on sendmail , then it will not stop it.

It is necessary to take into account that the absence of an explicitly specified dependency does not mean the absence of a dependency at all - a dependency can be mediated, for example, several tasks can simultaneously access a resource that does not allow for this (simultaneous writing to one file, changing the program configuration, or initializing the device). When performing tasks in sequence, such a situation is impossible and therefore may not be taken into account, but in case of parallel execution, it may be necessary to identify such resources and prevent their simultaneous use. In practice, this rarely happens, since most of the tasks ultimately run system utilities that are independently involved in blocking access, or the kernel does this if the access is made to its resources.

Tasks with dependencies


rcorder, topological sorting and sequential execution


The easiest way to perform dependencies - rcorder (8) - appeared in NetBSD 1.5 in 2000. The rc-scripts using the PROVIDE, REQUIRE and BEFORE annotations describe the dependencies between the scripts. The rcorder utility rcorder a list of scripts, extracts relationships from them, builds a dependency graph, and returns a list of scripts in the order of topological sorting of the dependency graph.

Running scripts in this order ensures that the dependent scripts will be launched only after the dependency scripts have been completed. Scripts are executed alternately, and the possibility of parallel start is not implemented. Therefore, synchronization of parallel processes is not required and the implementation is simple and reliable.

A similar mechanism appeared later in Debian Lenny, where the Required-Start , Required-Stop , Provides annotations were added to SysV-init, which describe dependencies for different actions ( start and stop ). Further, the order of execution is determined by these dependencies, numbers are added to the names of the scripts and the scripts are placed in the starting directory:

 S23ntp S25mdadm S91apache2 


upstart, events and asynchronous execution


Upstart appeared in 2006 in Ubuntu and later began to be used in many other distributions. Instead of specifying relationships between tasks, the condition on the event is described here using the start on directive - an event or a logical expression of the presence of events, during which the task should be executed. For example, for services requiring a network, you can specify the start condition of the NETWORK_AVAILABLE event and then they will be launched after someone sends the NETWORK_AVAILABLE event to the system. In the condition, you can specify not only the dependence on some other task, but also more complex options, for example, runlevel and the state of other tasks.

This method of execution leads to the fact that tasks are executed asynchronously at the moment when the event occurred, and when the condition involving this event turned out to be true. If there are several tasks waiting for one event, they will be executed simultaneously. The dispatch of events and the execution of tasks will be handled by the init service. There are all the advantages in the form of speed of start and localization of problem branches.

systemd, dependencies, and asynchronous execution


Systemd appeared later, turned on an even wider number of functions, captured the upstart space and is now used as a subsystem for configuration, management, service execution and logging on most popular Linux distributions.

Unlike upstart , systemd uses dependencies, not events, to start services. These dependencies are more complex and varied than in rcorder — here it can be order relations (starting a task before or after another task) and requirements (presence is desirable, another task should be present or should be absent). Tasks (units) are divided into several types - services (daemons), devices, mounting file systems, etc.

Tasks are executed in parallel, as in upstart . The method of describing dependencies allows you to check them at run time for the presence of deadlocks, that is, cyclic dependencies (when a task through a chain of other tasks ultimately depends on itself). systemd as far as possible tries to solve such problems and removes creating tasks from the execution transaction ( Wants = ), and also checks for dependencies-conflicts ( Conflicts = ), not allowing conflicting tasks to start.

runit, parallel execution and synchronization when performing a single task


Runit is positioned as a lightweight replacement of init and the system of launching scripts, with a clear device and simple configuration. It is also convenient to use this program for supervision over the work of services or groups of services, for example, for starting individual application servers and databases. Because of its simplicity and cross-platform, runit often used on embedded systems or for managing web applications.

There are no dependencies as such in runit . Instead, in the service start script , the sv command is specified, starting other services . sv independently checks whether the service is running or not, and prevents from duplicate launches of the same service. This is enough to allow programs to run in parallel, but does not allow detecting cyclic dependencies leading to deadlocks.

runit designed to manage services, that is, constantly running demons. It checks the process and restarts it if necessary. But its logic does not fit well with the execution of one-time tasks with dependencies, for example, the client receiving network settings via DHCP.

Supervisors or Schedulers


Of the above tools, only rcorder is a pure scheduler for running dependent tasks. All the others are designed to start services with dependencies and then control the existence of the service, that is, they are service supervisors. systemd and upstart also support one-time tasks ( Type = oneshot or task ) while preserving all the dependency logic and thus allow you to conveniently manage the loading and operation of the computer, taking into account most details. A more formal way of describing dependencies in systemd , and using targets as a substitute for modes of operation (runlevels), is a major evolution after the upstart . But both upstart and its systemd follower are complex and cumbersome tools that are difficult to integrate into the operating environment, which differs from their main purpose.

The focus on capturing all aspects of the operating system and the solidity of the system makes it difficult or impossible to integrate systemd into an environment in which narrow responsibility would be placed on it. For this reason, to manage individual services and for embedded systems, they still continue to use runit , daemontools , eye , supervisord, or their own scripts that manage demons, taking into account the features of their work.

Both upstart and systemd along with runit are claimed as a means to ensure the serviceability of services (daemons). This statement contains a large share of slyness, since by ensuring the operability of the service it is meant to restart the fallen service. This tracks the genetic relationship with init , one of the purposes of which was to run getty , the terminal maintenance process. Since terminals were the main means of work and management in Unix, the terminal’s operability could be called the second most critical function after the kernel’s operability. Accordingly, the most important and very first process was assigned the role of restarting getty processes both during normal terminal release and getty . The reason why getty dropped or the terminal was freed was irrelevant to the fact that there was no getty on the terminal and the possibility of entering the computer was lost.

The evolutionary descendants of init undertook to restart not only getty , but also all constantly running services. But the supervision of services remained at the primitive level - if the demon fell, it is simply restarted. More serious monitoring of both the quality of services (absence of internal errors, maintenance of requests on time), and control over their technical parameters (memory size, CPU usage) still have to be performed by other means. Therefore, the use of systemd and its analogues in a pure form for reliable operation is not very suitable, and when integrated with other means of monitoring and management, one must additionally take into account the competitive influence of supervisory systems.

In some cases, an automatic restart of a dropped service will be dangerous, since if it is somehow related to storing data, then due to some non-stored data or a broken structure, an error will occur in the stored data. And if, after an emergency restart, work continues without taking measures to check the integrity of the data and eliminate possible errors, the errors will be multiplied. Therefore, for a simple and reliable system operation, a more correct approach would be, where monitoring the operation of services and the response to their disruption will be tailored to the characteristics of each particular service, rather than being done automatically by the same comb for all.

The use of large and actively growing systemsd-type complexes leads to another risk - either attachment to the selected version of the program and difficulty with backporting error fixes, or the constant probability of encountering large rework when updating a version.

multitask-flow, parallel execution of dependent tasks


To start and control the modes of operation of our various equipment, a minimalist tool was needed to perform tasks with dependencies, which would not try to perform additional functions for state control and restart, but would simply do its main work well (execution by dependencies) and which would be easily integrated into different sites. In addition to convenient integration, cross-platform and ease of maintenance were required. With some improvements, it later found other applications — the launch of custom containers, the management of cluster operating modes, the start, and the management of operating modes of other subsystems.

The default option was a C program. Dependencies are independently extracted from scripts or obtained by an external script and are served in a form compatible with tsort(1) . The list is used to construct the dependency digraph, it is checked for cycles, after which the tasks are performed in the reverse order of resetting or resetting the dependency counter. Unlike upstart and systemd , which require Linux kernel 2.6.24 and 3.0 for their needs, in this case, the usual fork-exec-wait scheme and any POSIX kernel are sufficient. But then a more convenient alternative was chosen - a utility and a small POSIX shell library: jet9-multitask-flow .

The use of the shell gives obvious advantages: cross-platform, that is, work in any environment where there is a POSIX shell, and flexibility that allows you to extend the script to the specific features of the scripts it manages - a way of describing dependencies, a way of doing it. There are also disadvantages: the language makes it easy to make hard-to-find errors and, being an interpreter, leads to a low execution speed. Due to the small size of the code, the first problem turns out to be insignificant, and the speed of execution for our application is also not a problem: the time required to perform the control functions turns out to be negligible compared to the time of task execution.

It looks paradoxical, but the execution time of a set of several hundred tasks using the scheduler on sh turns out to be several times faster than using a similar scheduler on C. This has a rational explanation - shell scripts using a large shared library will work faster if the required load the library once into the parent process of the shell scheduler and then execute all hundreds of scripts in the sub shells. If the C scheduler is used and it runs hundreds of scripts via exec() , then much more time is required to initialize the shell and load all the libraries with each script.

Means for synchronizing parallel shell processes


Lock file as mutex


In the dialects of the shell, where it is possible to start parallel processes (they are called background processes in this area), the wait command is also present. This command, when executed in the parent process, waits for the completion of all child processes. The analogy of multi-threaded programming is the creation of several threads and then wait through join completion of all threads created from the main thread. This is a means of synchronization, but in many cases the tool is not very suitable: in addition to waiting for all the descendant processes, it is impossible to wait for processes that are not descendants.

Lock-file mutexes have been used for a long time to prevent simultaneous access to a resource in shell scripts. As a rule, an atomic operation is used for this - creating a file or creating a directory via mkdir . If this operation fails, the resource is considered busy and the script interrupts execution. When operating on files, the no clobber flag can be used to prevent writing to existing files, and creating a directory in the presence of an existing one is impossible. But the atomicity of such operations is provided not by the language or the runtime, but by the file system, so the behavior may differ under different conditions.

This is a simple script and it is great for most cases - when it is required to prohibit duplication of script execution, for example, if you frequently run long-lived scripts through cron or when daemons start. If it is required not to interrupt work, but to wait for the resource to be released, then the usual technique for this is a periodic check for the absence of a lock, in fact, a spinlock with falling asleep: if a file or directory exists, then the script goes to sleep for a while and then tries again. It is pointless and harmful to do spinlock on a shell without falling asleep - due to the nature of using shell scripts, the blocking time can be either a fraction of a second or several tens of seconds, which is why a non-stop check will load both the processor and the OS kernel, taking away resources from all and slowing down, including the competing process.

Falling asleep saves the processor, but at the same time decreases the speed of reaction to the release of the lock. Previously, when the sleep command understood only integer arguments, the minimum sleep time was 1 second. Now this value may already be smaller, but, in any case, it will be some constant, which may be too large for short actions in a few milliseconds. Therefore, for a quick response, a mechanism is desirable that implements waiting for a resource to be released without falling asleep for a fixed time, but also without active waiting.

One method for stopping and asynchronous resuming after that is to transmit signals, such as falling asleep and transmitting SIGALRM . The method is also available for the shell, but it requires additional complex actions to notify the Lok owner about the waiting processes. flock(2) fcntl(2) , , , - .

FIFO-


Unix named pipes — , , . Unix — pipe(2) , . — , . , . , , .., , , . , pipe(2) , FIFO- mkfifo(2) , , FIFO- , .

, ( ), , , , . , , , . , , , , .

: FIFO-; , , ; FIFO- , . , .

, FIFO- wait , broadcast notifyAll .

shell


(), , :

 mkfifo $lock_file 


, :

 read none < $lock_file 


(broadcast), race condition, - , , :

 mv $lock_file $lock_file.$$ : <> $lock_file rm -f $lock_file.$$ 



FIFO- mkfifo(2) ( POSIX.1), mkfifo(1) ( POSIX.2) , , POSIX shell. Linux, FreeBSD, , , Unix-like .

There is a caveat: to unlock waiting processes, you need to open the FIFO file for writing and close it immediately. But if there is not a single process reading from the pipe (the resource is of no interest to anyone), the process opening the file for writing will be blocked. This blocking will continue until someone reading from the pipe appears, and if no one appears, the blocking will be permanent. For Linux and FreeBSD, this problem is solved by the fact that the FIFO file can be opened simultaneously for writing and reading, in which case the process is never blocked. Thus, the logic is observed that the FIFO file does not block the processes when it is opened, if it is open for both reading and writing, even if this is the same single process. In POSIX, this behavior is not described in any way, but in the Linux kernel (fs / fifo.c) on this occasion separately mentioned with a justification why such a discovery is not blocked:

 / *
 * O_RDWR
 * POSIX.1 leaves this case "undefined" when O_NONBLOCK is set.
 * This implementation will NEVER block on a O_RDWR open, since
 * the process can at least talk to itself.
  * /


FreeBSD , , .

, - . , AIX , O_RDWR . FIFO- , , FIFO- .

multitask-flow


FIFO-


, , - shell . , , .

, . , FIFO- . , .

, , - . .

, , . (FIFO-), . — broadcast- . . , , .

This method is somewhat wasteful in costs, since a separate sub-shell will be launched for each task, and therefore a separate process. To execute a stream of hundreds of tasks, hundreds of processes will be launched. But this expense is not as great as it may seem - the code and data segments are almost completely in common with the main process and only the stack and some other small piece of allocated memory will be their own for each process. For Linux on a 64-bit Intel architecture, for a dash on the 3.16 kernel is about 160 KB per process, for FreeBSD in a similar situation about 120 KB. That is, the simultaneous waiting of 100 tasks requires 12-16 MB.

The blocking, waiting, and notification of unblocking are implemented according to the principle described earlier on FIFO files and put into separate functions ( dependency_init,dependency_wait , dependency_mark ) . wait + notifyAll/broadcast , , . sinchronized, ( dependency_lock ). FIFO-, .

The functions of the last group organize interaction with real tasks - with system startup scripts, with service scripts, etc. With the help of some functions, tasks, dependencies and relationships between tasks are obtained, and with the help of other functions, tasks are performed. You can redefine these functions under specific rules for annotation of dependencies and init-script execution and leave the scripts themselves unchanged. For example, the parallel launch of rc-scripts for FreeBSD was made according to this principle .

How can and how can not apply multitask-flow


jet9-multitask-init , Jet9 , init . init jet9-multitask-flow .

init , , . init — (runlevel). ( -), TrueVirtual . Linux c init Debian 6 jet9-multitask-init .

runit multitask-flow , , , Docker. , runit :



In Jet9 we transferred multitask-flowcustom containers to the start. For most sites, the LAMP environment is used, and in this case, only user apacheand are working inside the container mysqld. U mysqldhas its own supervisor mysqld_safe, apachethe supervisor function is performed by its root process. Another supervisor overseeing these supervisors is clearly redundant. On the rare occasions when the supervisor universal all the same it is necessary, along with multitask-flowused runit. But in other web environments, most often the application server also has its own supervisor and it’s enough to start services, as in LAMP, viamultitask-flow . , - Ruby on Rails Unicorn, - , . runit , unicorn , , , unicorn . eye , , , , , HTTP- . , , eye .

multitask-flow , . ? . , . . , SLA (99.99%, ~5 ), . 15-30 .

FreeBSD. In the current version, a quick way is rather difficult to get dependencies through awk (three awk launches for each script), and this definitely needs some work. In addition, you need to test several hundred system rc scripts and scripts from packages - dependencies and concurrent access conflicts may be incompletely written in them. With a sequential start, the wrong dependencies could be masked, but in a parallel start they can appear.

Most likely, with the help multitask-flowyou can speed up the start of systems on busyboxand OpenWRT, but in practice this has not yet been verified. In OpenWRT, there is already some groundwork for integration with systemdand, perhaps, an alternative is no longer needed for it.

Related Links





PS


- Jet9 ( ). , .


: Ruby, Python Java. . , , , , , . , .

:





Questions

Source: https://habr.com/ru/post/268763/


All Articles