Multitasking in the Linux kernel: workqueue

We continue the topic of multithreading in the Linux kernel. Last time I talked about interrupts, their processing and tasklets, and since it was originally supposed to be one article, I will refer to tasklets in my workqueue story, assuming that the reader is already familiar with them.
Like last time, I will try to make my story as detailed and detailed as possible.

Cycle articles:

Multitasking in the Linux kernel: interrupts and tasklets
Multitasking in the Linux kernel: workqueue
Protothread and cooperative multitasking

Workqueue

Workqueue are more complex and ponderous entities than tasklets. I will not even try to describe here all the subtleties of implementation, but the most important, I hope, I will analyze in more or less detail.
Workqueue, like tasklets, are used for deferred processing of interrupts (although they can be used for other purposes), but, unlike tasklets, they are executed in the context of the kernel process, respectively, they are not required to be atomic and can be used the sleep () function, various synchronization tools, etc.

Let's first understand how the workqueue processing process is generally organized. In the picture it is shown very approximately and simplified, as everything actually happens, described in detail below.
')

Several entities are involved in this ~~dark~~ matter.
First, the work item (for brevity, simply work) is a structure that describes a function (for example, an interrupt handler) that we want to schedule. It can be interpreted as an analogue of the tasklet structure. During planning, tasklets were added to the queues that were hidden from the user, but now we need to use a special queue - workqueue .
Tasklets are cleared by the scheduler function, and workqueue is processed by special threads, which are called workers.
Worker 's provide asynchronous execution of work'ov from workqueue. Although they cause work'i in order of the queue, in the general case, a strict, consistent implementation of speech is not a question: after all, crowding, sleeping, waiting, etc. take place here.

In general, workers are kernel threads, that is, they are managed by the main Linux kernel scheduler. But workers partially intervene in planning for the additional organization of parallel execution of works. About this in more detail below.

To outline the main features of the workqueue mechanism, I propose to explore the API.

Pro turn and its creation

alloc_workqueue(fmt, flags, max_active, args...)

The fmt and args parameters are the printf format for the name and the arguments to it. The max_activate parameter is responsible for the maximum number of jobs that can be executed in parallel from this queue on one CPU.
You can create a queue with the following flags:

WQ_HIGHPRI
WQ_UNBOUND
WQ_CPU_INTENSIVE
WQ_FREEZABLE
WQ_MEM_RECLAIM

Particular attention should be paid to the WQ_UNBOUND flag. By the presence of this flag, the queues are divided into tied and non-tied.
In the attached queues, the work'i when added are tied to the current CPU, that is, in such queues, the work'i are executed on the kernel that plans it. In this regard, the associated queues resemble tasklets.
In unbound queues, work'i can be executed on any core.

An important feature of the workqueue implementation in the Linux kernel is the additional organization of parallel execution, which is present in the associated queues. About her in more detail is written below, now I will say that it is carried out in such a way as to use as little memory as possible, and so that the processor does not stand idle. This is all implemented with the assumption that one work does not use too many processor cycles.
For unbound queues this is not. In fact, such queues simply provide context to the work and launch it as early as possible.
Thus, unbound queues should be used if intensive processor load is expected, since in this case the scheduler will take care of parallel execution on multiple cores.

By analogy with tasklets, works can be assigned priority of execution, normal or high. Priority common to the entire queue. By default, the queue has a normal priority, and if you set the flag WQ_HIGHPRI , then, respectively, high.

The WQ_CPU_INTENSIVE flag only makes sense for bound queues. This flag is a refusal to participate in the additional organization of parallel execution. This flag should be used when it is expected that work'i will spend a lot of CPU time, in this case it is better to shift the responsibility to the scheduler. This is described in more detail below.

The WQ_FREEZABLE and WQ_MEM_RECLAIM flags are specific and beyond the scope of the topic, so we will not dwell on them in detail.

Sometimes it makes sense not to create your own queues, but to use shared ones. The main ones are:

system_wq - anchored queue for fast works
system_long_wq - the associated queue for work'ov, which presumably will be executed for a long time
system_unbound_wq - unattached queue

Pro work'i and their planning

Now let's deal with the works. First, take a look at the initialization, declaration, and preparation macros:

 DECLARE(_DELAYED)_WORK(name, void (*function)(struct work_struct *work)); /*    */ INIT(_DELAYED)_WORK(_work, _func); /*    */ PREPARE(_DELAYED)_WORK(_work, _func); /*     */

In the queue work'i added using functions:

 bool queue_work(struct workqueue_struct *wq, struct work_struct *work); bool queue_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay); /* work        delay */

Here it is worth staying in more detail. Although we specify the queue as a parameter, in fact, works are not put into the workqueue themselves, as it may seem, but in a completely different entity - into the list — the queue of the worker_pool structure. The worker_pool structure is essentially the most important entity in the organization of the workqueue mechanism, although for the user it remains behind the scenes. Workers work with them, and they contain all the basic information.

Now let's see what pools are in the system.
For starters, pools for pending queues (in the picture). For each CPU, two worker pools are statically allocated: one for high-priority work'ov, the other - for work'ov with normal priority. That is, if we have four cores, then there will be only eight associated pools, despite the fact that the workqueue can be any number of times.
When we create a workqueue, it has a pool_workqueue (pwq) for each CPU. Each such pool_workqueue is associated with a worker pool, which is allocated on the same CPU and corresponds in priority to the type of queue. Through them, workqueue interacts with the worker pool.
Workers perform works from the worker pool indiscriminately, not distinguishing to which workqueue they originally belonged.

For unbound queues, the worker pools are allocated dynamically. All queues can be divided into equivalence classes by their parameters, and for each such class a worker pool is created. They are accessed using a special hash table, where the key is the set of parameters, and the value, respectively, of the worker pool.
In fact, unbound queues are a bit more complicated: if bound queues created pwq and queues for each CPU, here they are created for each NUMA node, but this is an additional optimization, which we will not consider in detail.

All sorts of stuff

I’ll also give a few functions from the API for completeness, but I’d not talk about them in detail:

 /*   */ bool flush_work(struct work_struct *work); bool flush_delayed_work(struct delayed_work *dwork); /*   work */ bool cancel_work_sync(struct work_struct *work); bool cancel_delayed_work(struct delayed_work *dwork); bool cancel_delayed_work_sync(struct delayed_work *dwork); /*   */ void destroy_workqueue(struct workqueue_struct *wq);

How workers do their job

Now, as we got acquainted with the API, let's try to understand in more detail how this all works and is controlled.
Each pool has a set of workers that solve tasks. Moreover, the number of workers changes dynamically, adjusting to the current situation.
As we already found out, workers are threads that work in the context of the kernel. The Worker gets them in order one by one from the associated worker pool, and the work'i, as we already know, can belong to different source queues.

Workers can conditionally be in three logical states: they can be idle, running, or controlling.
Worker can stand idle and do nothing. This, for example, when all the work'i already performed. When a worker enters this state, he falls asleep and, accordingly, will not be executed until he is awakened;
If the management of the pool is not required and the list of scheduled works is not empty, then the worker starts to execute them. Such workers will conditionally be called running .
If necessary, the worker takes on the role of a pool manager . A pool can have either only one controlling worker, or not have it at all. Its task is to maintain the optimal number of workers per pool. How he does it? First, workers who have been idle for a long time are deleted. Secondly, new workers are created if three conditions are fulfilled at once:

there are still tasks to perform (work'i in the pool)
no idle workers
no working workers (i.e. active and not sleeping)

However, in the latter condition there are some nuances. If pool queues are unbound, then running workers are not taken into account, for them this condition is always true. The same is true if the worker performs the task from the associated queue, but with the WQ_CPU_INTENSIVE flag. In this case, in the case of attached queues, since workers work with work from the common pool (which is one of two for each core in the picture above), it turns out that some of them are counted as working, and some are not. From this it also follows that the execution of works from the WQ_CPU_INTENSIVE queue may not begin immediately, but they themselves do not interfere with other works being executed. Now it should be clear why this flag is so called, and why it is used when we expect the work to be performed for a long time.

Accounting for working workers is done directly from the main Linux kernel scheduler. Such a control mechanism ensures an optimal level of parallelism (concurrency level), not allowing workqueue to create too many workers, but also not forcing work, and without waiting too long.

Those who are interested can see the worker's function in the kernel, it is called worker_thread ().

All the described functions and structures can be found in more detail in the include / linux / workqueue.h , kernel / workqueue.c and kernel / workqueue_internal.h files . Also on workqueue there is documentation in Documentation / workqueue.txt .

It is also worth noting that the workqueue mechanism is used in the kernel not only for delayed interrupt handling (although this is a fairly common scenario).

Thus, we looked at the delayed interrupt handling mechanisms in the Linux kernel — the tasklet and workqueue, which are a special form of multitasking. You can read about interrupts, tasklets and workqueue in the book " Linux Device Drivers " by Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini, although the information there is sometimes outdated.
Zyoma’s commentary on the tasklet article also advises “The Linux kernel. Description of the development process "R. Love.

To be continued

In the next part, I will talk about protothread and cooperative multitasking, try to compare all the different entities considered at first glance and extract some useful ideas.

Source: https://habr.com/ru/post/244155/

All Articles