How to synchronize parallel shell processes using named pipes (FIFO files) as condition variables. How to perform parallel dependent tasks in topological order with a minimum of tools: POSIX shell, mkfifo, POSIX kernel. How parallel launch speeds up the loading of embedded systems and * BSD (rc-stage FreeBSD from 27 to 7 seconds) or the launch of applications in Docker, LXC and jail user containers. How it improves uptime in
Jet9 failover clusters.
1
Why do you need tasks with dependencies and what gives them a parallel launch2
Parallel vs sequential3
Performing tasks with dependencies3.1
rcorder, topological sorting and sequential execution3.2
upstart, events and asynchronous execution3.3
systemd, dependencies and asynchronous execution3.4
runit, parallel execution and synchronization when performing a single task4
Supervisors or Planners4.1
multitask-flow, parallel execution of dependent tasks5
Means for synchronization of parallel shell processes5.1
Lock file as mutex5.2
FIFO file as a condition variable5.3.1
Constructs for a condition variable in a shell5.3.2
Requirements for the working environment6
Device multitask-flow6.1
Parallel execution and synchronization on FIFO files6.2
How can and how can not apply multitask-flow7
Related LinksWhy do you need tasks with dependencies and what gives them a parallel launch
Frequently encountered tasks with pronounced dependencies are loading or changing the server operation mode and starting / stopping user containers. Most of our Linux and FreeBSD hardware is not loaded into the standard installation of the distribution, but into our own operating environment. This is a compact image located on a RAM disk, in which there are system programs required for the main profile of work, and various auxiliary utilities. Starting a service, initializing a device, applying settings are separate tasks that are present or absent in the system depending on the purpose of the server, and these tasks must be performed in the correct sequence. For example, if the network configuration starts running before the network card driver is loaded, the network will not work. Different applications work in user containers, the correctness of the work of some of them may depend on the work of others, which is why the order in which they start and stop is important. For example, if the application server is started before the database server, it may immediately stop due to the inability to connect to the database.
')
To start the system now there are two generally accepted approaches. The first is to specify the order in which tasks are performed explicitly. To do this, you can either list the tasks in the execution order in the shell script, or
number the tasks and use the number in the file name of the task, and then execute them by sorting the file name:
S23ntp S25mdadm S91apache2
The second way to start is to specify the
dependencies between the start scripts, that is, to list what conditions are needed to complete the task and what conditions the task creates after its completion:
Although specifying dependencies is more complicated than numbering, it is much more convenient and is now used almost everywhere.
The main drawback that has explicit numbering of tasks is that the developer first needs to identify dependencies on other tasks, then find the numbers of these dependencies, and select the appropriate number for the new task, in the process which may reveal that the numbering was unsuccessful and needs to be changed for some of the scripts . When performing according to dependencies, the main advantage is the simplicity of adding tasks to an already established process — for a new task, it is enough just to list its dependencies on other tasks. Another convenience is the possibility of parallel execution of tasks, because only dependent tasks should be performed in turn, and tasks that are not dependent on each other can be performed in parallel.
Parallel vs serial
Parallel execution when performing tasks has two key advantages:
- if tasks are not limited to a single resource (for example, one processor), then with parallel execution they will be completed earlier than with a sequential one;
- if one of the tasks for some reason has stopped, then it does not block the performance of other tasks that do not depend on it and they will continue to be performed.
With regard to the latter, some people are probably familiar with the situation when
sendmail
does not start up for a long time due to network problems and waits for DNS responses for tens of minutes, as a result of which it’s impossible to log in to the server, since
sshd
was in the starting sequence later than
sendmail
. With a parallel start, if
sshd
does not specify a dependency on
sendmail
, then it will not stop it.
It is necessary to take into account that the absence of an explicitly specified dependency does not mean the absence of a dependency at all - a dependency can be mediated, for example, several tasks can simultaneously access a resource that does not allow for this (simultaneous writing to one file, changing the program configuration, or initializing the device). When performing tasks in sequence, such a situation is impossible and therefore may not be taken into account, but in case of parallel execution, it may be necessary to identify such resources and prevent their simultaneous use. In practice, this rarely happens, since most of the tasks ultimately run system utilities that are independently involved in blocking access, or the kernel does this if the access is made to its resources.
Tasks with dependencies
rcorder, topological sorting and sequential execution
The easiest way to perform dependencies -
rcorder (8) - appeared in NetBSD 1.5 in 2000. The rc-scripts using the PROVIDE, REQUIRE and BEFORE annotations describe the dependencies between the scripts. The
rcorder
utility
rcorder
a list of scripts, extracts relationships from them, builds a dependency graph, and returns a list of scripts in the order of topological sorting of the dependency graph.
Running scripts in this order ensures that the dependent scripts will be launched only after the dependency scripts have been completed. Scripts are executed alternately, and the possibility of parallel start is not implemented. Therefore, synchronization of parallel processes is not required and the implementation is simple and reliable.
A similar mechanism appeared later in Debian Lenny, where the
Required-Start ,
Required-Stop ,
Provides annotations were added to SysV-init, which describe dependencies for different actions (
start and
stop ). Further, the order of execution is determined by these dependencies, numbers are added to the names of the scripts and the scripts are placed in the starting directory:
S23ntp S25mdadm S91apache2
upstart, events and asynchronous execution
Upstart appeared in 2006 in Ubuntu and later began to be used in many other distributions. Instead of specifying relationships between tasks, the
condition on the event is described here using the
start on directive - an event or a logical expression of the presence of events, during which the task should be executed. For example, for services requiring a network, you can specify the start condition of the NETWORK_AVAILABLE event and then they will be launched after someone sends the NETWORK_AVAILABLE event to the system. In the condition, you can specify not only the dependence on some other task, but also more complex options, for example, runlevel and the state of other tasks.
This method of execution leads to the fact that tasks are executed asynchronously at the moment when the event occurred, and when the condition involving this event turned out to be true. If there are several tasks waiting for one event, they will be executed simultaneously. The dispatch of events and the execution of tasks will be handled by the
init
service. There are all the advantages in the form of speed of start and localization of problem branches.
systemd, dependencies, and asynchronous execution
Systemd appeared later, turned on an even wider number of functions, captured the
upstart
space and is now used as a subsystem for configuration, management, service execution and logging on most popular Linux distributions.
Unlike
upstart
,
systemd
uses dependencies, not events, to start services. These
dependencies are more complex and varied than in
rcorder
— here it can be order relations (starting a task before or after another task) and requirements (presence is desirable, another task should be present or should be absent). Tasks (units) are divided into several types - services (daemons), devices, mounting file systems, etc.
Tasks are executed in parallel, as in
upstart
. The method of describing dependencies allows you to check them at run time for the presence of deadlocks, that is, cyclic dependencies (when a task through a chain of other tasks ultimately depends on itself).
systemd
as far as possible tries to solve such problems and removes creating tasks from the execution transaction (
Wants = ), and also checks for dependencies-conflicts (
Conflicts = ), not allowing conflicting tasks to start.
runit, parallel execution and synchronization when performing a single task
Runit is positioned as a lightweight replacement of
init
and the system of launching scripts, with a clear device and simple configuration. It is also convenient to use this program for supervision over the work of services or groups of services, for example, for starting individual application servers and databases. Because of its simplicity and cross-platform,
runit
often used on embedded systems or for managing web applications.
There are no dependencies as such in
runit
. Instead, in the service start script
, the sv
command is specified, starting other services .
sv
independently checks whether the service is running or not, and prevents from duplicate launches of the same service. This is enough to allow programs to run in parallel, but does not allow detecting cyclic dependencies leading to deadlocks.
runit
designed to manage services, that is, constantly running demons. It checks the process and restarts it if necessary. But its logic does not fit well with the execution of one-time tasks with dependencies, for example, the client receiving network settings via DHCP.
Supervisors or Schedulers
Of the above tools, only
rcorder
is a pure scheduler for running dependent tasks. All the others are designed to start services with dependencies and then control the existence of the service, that is, they are service supervisors.
systemd
and
upstart
also support one-time tasks (
Type = oneshot or
task ) while preserving all the dependency logic and thus allow you to conveniently manage the loading and operation of the computer, taking into account most details. A more formal way of describing dependencies in
systemd
, and using targets as a substitute for modes of operation (runlevels), is a major evolution after the
upstart
. But both
upstart
and its
systemd
follower are complex and cumbersome tools that are difficult to integrate into the operating environment, which differs from their main purpose.
The focus on capturing all aspects of the operating system and the solidity of the system makes it difficult or impossible to integrate
systemd
into an environment in which narrow responsibility would be placed on it. For this reason, to manage individual services and for embedded systems, they still continue to use
runit
,
daemontools ,
eye ,
supervisord, or their own scripts that manage demons, taking into account the features of their work.
Both
upstart
and
systemd
along with
runit
are claimed as a means to ensure the serviceability of services (daemons). This statement contains a large share of slyness, since by ensuring the operability of the service it is meant to restart the fallen service. This tracks the genetic relationship with
init
, one of the purposes of which was to run
getty
, the terminal maintenance process. Since terminals were the main means of work and management in Unix, the terminal’s operability could be called the second most critical function after the kernel’s operability. Accordingly, the most important and very first process was assigned the role of restarting
getty
processes both during normal terminal release and
getty
. The reason why
getty
dropped or the terminal was freed was irrelevant to the fact that there was no
getty
on the terminal and the possibility of entering the computer was lost.
The evolutionary descendants of
init
undertook to restart not only
getty
, but also all constantly running services. But the supervision of services remained at the primitive level - if the demon fell, it is simply restarted. More serious monitoring of both the quality of services (absence of internal errors, maintenance of requests on time), and control over their technical parameters (memory size, CPU usage) still have to be performed by other means. Therefore, the use of
systemd
and its analogues in a pure form for reliable operation is not very suitable, and when integrated with other means of monitoring and management, one must additionally take into account the competitive influence of supervisory systems.
In some cases, an automatic restart of a dropped service will be dangerous, since if it is somehow related to storing data, then due to some non-stored data or a broken structure, an error will occur in the stored data. And if, after an emergency restart, work continues without taking measures to check the integrity of the data and eliminate possible errors, the errors will be multiplied. Therefore, for a simple and reliable system operation, a more correct approach would be, where monitoring the operation of services and the response to their disruption will be tailored to the characteristics of each particular service, rather than being done automatically by the same comb for all.
The use of large and actively growing systemsd-type complexes leads to another risk - either attachment to the selected version of the program and difficulty with backporting error fixes, or the constant probability of encountering large rework when updating a version.
multitask-flow, parallel execution of dependent tasks
To start and control the modes of operation of our various equipment, a minimalist tool was needed to perform tasks with dependencies, which would not try to perform additional functions for state control and restart, but would simply do its main work well (execution by dependencies) and which would be easily integrated into different sites. In addition to convenient integration, cross-platform and ease of maintenance were required. With some improvements, it later found other applications — the launch of custom containers, the management of cluster operating modes, the start, and the management of operating modes of other subsystems.
The default option was a C program. Dependencies are independently extracted from scripts or obtained by an external script and are served in a form compatible with
tsort(1)
. The list is used to construct the dependency digraph, it is checked for cycles, after which the tasks are performed in the reverse order of resetting or resetting the dependency counter. Unlike
upstart
and
systemd
, which require Linux kernel 2.6.24 and 3.0 for their needs, in this case, the usual
fork-exec-wait scheme and any POSIX kernel are sufficient. But then a more convenient alternative was chosen - a utility and a small POSIX shell library:
jet9-multitask-flow .
The use of the shell gives obvious advantages: cross-platform, that is, work in any environment where there is a POSIX shell, and flexibility that allows you to extend the script to the specific features of the scripts it manages - a way of describing dependencies, a way of doing it. There are also disadvantages: the language makes it easy to make hard-to-find errors and, being an interpreter, leads to a low execution speed. Due to the small size of the code, the first problem turns out to be insignificant, and the speed of execution for our application is also not a problem: the time required to perform the control functions turns out to be negligible compared to the time of task execution.
It looks paradoxical, but the execution time of a set of several hundred tasks using the scheduler on
sh
turns out to be several times faster than using a similar scheduler on C. This has a rational explanation - shell scripts using a large shared library will work faster if the required load the library once into the parent process of the shell scheduler and then execute all hundreds of scripts in the sub shells. If the C scheduler is used and it runs hundreds of scripts via
exec()
, then much more time is required to initialize the shell and load all the libraries with each script.
Means for synchronizing parallel shell processes
Lock file as mutex
In the dialects of the shell, where it is possible to start parallel processes (they are called background processes in this area), the
wait
command is also present. This command, when executed in the parent process, waits for the completion of all child processes. The analogy of multi-threaded programming is the creation of several threads and then wait through
join
completion of all threads created from the main thread. This is a means of synchronization, but in many cases the tool is not very suitable: in addition to waiting for all the descendant processes, it is impossible to wait for processes that are not descendants.
Lock-file mutexes have been used for a long time to prevent simultaneous access to a resource in shell scripts. As a rule, an atomic operation is used for this - creating a file or creating a directory via
mkdir
. If this operation fails, the resource is considered busy and the script interrupts execution. When operating on files, the
no clobber flag can be used to prevent writing to existing files, and creating a directory in the presence of an existing one is impossible. But the atomicity of such operations is provided not by the language or the runtime, but by the file system, so the behavior may differ under different conditions.
This is a simple script and it is great for most cases - when it is required to prohibit duplication of script execution, for example, if you frequently run long-lived scripts through
cron
or when daemons start. If it is required not to interrupt work, but to wait for the resource to be released, then the usual technique for this is a periodic check for the absence of a lock, in fact, a spinlock with falling asleep: if a file or directory exists, then the script goes to sleep for a while and then tries again. It is pointless and harmful to do spinlock on a shell without falling asleep - due to the nature of using shell scripts, the blocking time can be either a fraction of a second or several tens of seconds, which is why a non-stop check will load both the processor and the OS kernel, taking away resources from all and slowing down, including the competing process.
Falling asleep saves the processor, but at the same time decreases the speed of reaction to the release of the lock. Previously, when the
sleep
command understood only integer arguments, the minimum sleep time was 1 second. Now this value may already be smaller, but, in any case, it will be some constant, which may be too large for short actions in a few milliseconds. Therefore, for a quick response, a mechanism is desirable that implements waiting for a resource to be released without falling asleep for a fixed time, but also without active waiting.
One method for stopping and asynchronous resuming after that is to transmit signals, such as falling asleep and transmitting
SIGALRM
. The method is also available for the shell, but it requires additional complex actions to notify the Lok owner about the waiting processes.
flock(2)
fcntl(2)
, , , - .
FIFO-
Unix
named pipes — , , . Unix —
pipe(2)
, . — , . , . , , .., , , . ,
pipe(2)
, FIFO-
mkfifo(2)
, , FIFO- , .
, ( ), , , , . , , , . , , , , .
: FIFO-; , , ; FIFO- , . , .
, FIFO-
wait
,
broadcast
notifyAll
.
shell
(), , :
mkfifo $lock_file
, :
read none < $lock_file
(broadcast), race condition, - , , :
mv $lock_file $lock_file.$$ : <> $lock_file rm -f $lock_file.$$
FIFO-
mkfifo(2)
( POSIX.1),
mkfifo(1)
( POSIX.2) , , POSIX shell. Linux, FreeBSD, , , Unix-like .
There is a caveat: to unlock waiting processes, you need to open the FIFO file for writing and close it immediately. But if there is not a single process reading from the pipe (the resource is of no interest to anyone), the process opening the file for writing will be blocked. This blocking will continue until someone reading from the pipe appears, and if no one appears, the blocking will be permanent. For Linux and FreeBSD, this problem is solved by the fact that the FIFO file can be opened simultaneously for writing and reading, in which case the process is never blocked. Thus, the logic is observed that the FIFO file does not block the processes when it is opened, if it is open for both reading and writing, even if this is the same single process. In POSIX, this behavior is not described in any way, but in the Linux kernel (fs / fifo.c) on this occasion separately mentioned with a justification why such a discovery is not blocked: / *
* O_RDWR
* POSIX.1 leaves this case "undefined" when O_NONBLOCK is set.
* This implementation will NEVER block on a O_RDWR open, since
* the process can at least talk to itself.
* /
FreeBSD , , .
, - . , AIX , O_RDWR . FIFO- , , FIFO- .
multitask-flow
FIFO-
, , - shell . , , .
, . , FIFO- . , .
, , - . .
, , . (FIFO-), . — broadcast- . . , , .
This method is somewhat wasteful in costs, since a separate sub-shell will be launched for each task, and therefore a separate process. To execute a stream of hundreds of tasks, hundreds of processes will be launched. But this expense is not as great as it may seem - the code and data segments are almost completely in common with the main process and only the stack and some other small piece of allocated memory will be their own for each process. For Linux on a 64-bit Intel architecture, for a dash on the 3.16 kernel is about 160 KB per process, for FreeBSD in a similar situation about 120 KB. That is, the simultaneous waiting of 100 tasks requires 12-16 MB.The blocking, waiting, and notification of unblocking are implemented according to the principle described earlier on FIFO files and put into separate functions ( dependency_init
,dependency_wait
,
dependency_mark
) .
wait + notifyAll/broadcast , , . sinchronized, (
dependency_lock
). FIFO-, .
The functions of the last group organize interaction with real tasks - with system startup scripts, with service scripts, etc. With the help of some functions, tasks, dependencies and relationships between tasks are obtained, and with the help of other functions, tasks are performed. You can redefine these functions under specific rules for annotation of dependencies and init-script execution and leave the scripts themselves unchanged. For example, the parallel launch of rc-scripts for FreeBSD was made according to this principle .How can and how can not apply multitask-flow
jet9-multitask-init
,
Jet9 ,
init
.
init
jet9-multitask-flow
.
init
, , .
init
— (runlevel). ( -),
TrueVirtual . Linux c
init
Debian 6
jet9-multitask-init
.
runit
multitask-flow
, , , Docker. ,
runit
:
- ( , DHCP, service discovery, DNS registration)
- “” , mark ,
- ,
tsort
,
In Jet9 we transferred multitask-flow
custom containers to the start. For most sites, the LAMP environment is used, and in this case, only user apache
and are working inside the container mysqld
. U mysqld
has its own supervisor mysqld_safe
, apache
the supervisor function is performed by its root process. Another supervisor overseeing these supervisors is clearly redundant. On the rare occasions when the supervisor universal all the same it is necessary, along with multitask-flow
used runit
. But in other web environments, most often the application server also has its own supervisor and it’s enough to start services, as in LAMP, viamultitask-flow
. , - Ruby on Rails Unicorn, - , .
runit
,
unicorn
, , ,
unicorn
.
eye , , , , , HTTP- . , ,
eye
.
multitask-flow
, . ? . , . . , SLA (99.99%, ~5 ), . 15-30 .
FreeBSD. In the current version, a quick way is rather difficult to get dependencies through awk (three awk launches for each script), and this definitely needs some work. In addition, you need to test several hundred system rc scripts and scripts from packages - dependencies and concurrent access conflicts may be incompletely written in them. With a sequential start, the wrong dependencies could be masked, but in a parallel start they can appear.Most likely, with the help multitask-flow
you can speed up the start of systems on busybox
and OpenWRT
, but in practice this has not yet been verified. In OpenWRT, there is already some groundwork for integration with systemd
and, perhaps, an alternative is no longer needed for it.Related Links
PS
- Jet9 ( ). , .
: Ruby, Python Java. . , , , , , . , .
:
- Java — OpenJDK 7, Tomcat 6, 7, 8
- Python mod_wsgi / Tornado, 2.6, 2.7, 3.3, 3.5
- Ruby mod_passenger / Unicorn / Puma, 1.8.7, 1.9.3, 2.0, 2.2
Questions