Organization of multitasking in the OS kernel

By the will of fate, I happened to deal with the organization of multitasking, more precisely pseudo-multitasking, since tasks divide time on one core of the processor. I have already met several articles on this topic in Habré, and it seemed to me that this topic was interesting to the community, so I will allow myself to make my modest contribution to the coverage of this issue.
First I will try to talk about the types of multitasking (cooperative and repressive). Then move on to planning principles for preemptive multitasking. The story is designed more for the novice reader who wants to figure out how multitasking works at the kernel level of the OS. But since everything will be accompanied by examples that can be compiled, run, and with which you can play around if you want, then perhaps the article will also interest those who are already familiar with the theory, but have never tried the scheduler. Those who are too lazy to read can immediately go to the study of the code, since the code of the examples will be taken from our project .
Well, and multithreaded cats to attract attention.

Introduction

First, we define what the term “multitasking” means. Here is the definition from Russian Wikipedia:

Multitasking (eng. Multitasking) - a property of the operating system or programming environment to provide the possibility of parallel (or pseudo-parallel) processing of several processes.

English gives, in my opinion, a less clear, but more detailed definition:

In the case of the same period of time. The common process resources, such as a CPU and main memory. In the case of a computer with a single CPU It is a scheduling problem. It is a switch to reassigning a context switch.

It introduces the concept of resource sharing (resources sharing) and, in fact, planning (scheduling). It is about planning (first of all, CPU time) that this article will discuss. Both definitions deal with process planning, but I will talk about flow-based planning.
')

Thus, we need to introduce another concept, let's call it the flow of execution - this is a set of instructions with a specific order that the processor performs during program operation.
Since we are talking about multitasking, then of course in the system there can be several of these computational threads. The thread whose instructions the processor is currently executing is called active. Since only one instruction can be executed on one processor core at a time, only one computing thread can be active. The process of selecting the active computational flow is called scheduling. In turn, the module that is responsible for this choice is called the scheduler.

There are many different planning methods. Most of them can be attributed to two main types:

non-displacing (cooperative) - the scheduler cannot take time from the computational flow until he himself returns it
preemptive - the scheduler selects the next active computational flow after a time slice has expired, the computational flow itself can also give the remainder of the time slice intended for it

Let's begin to deal with the non-displacing planning method, since it is very easy to implement.

Non-displacing scheduler

The considered non-preemptive scheduler is very simple, this material is given for beginners to make it easier to understand multitasking. Anyone who has an idea, even a theoretical one, can immediately go to the section “Forcing the scheduler”.

The simplest non-displacing scheduler

Imagine that we have several tasks that are rather short in time, and we can call them in turn. The task is designed as a normal function with a certain set of parameters. The scheduler will operate with an array of structures on these functions. It will traverse this array and call function tasks with the given parameters. The function, having performed the necessary actions for the task, will return control to the main cycle of the scheduler.

#include <stdio.h> #define TASK_COUNT 2 struct task { void (*func)(void *); void *data; }; static struct task tasks[TASK_COUNT]; static void scheduler(void) { int i; for (i = 0; i < TASK_COUNT; i++) { tasks[i].func(tasks[i].data); } } static void worker(void *data) { printf("%s\n", (char *) data); } static struct task *task_create(void (*func)(void *), void *data) { static int i = 0; tasks[i].func = func; tasks[i].data = data; return &tasks[i++]; } int main(void) { task_create(&worker, "First"); task_create(&worker, "Second"); scheduler(); return 0; }

Output Results:

First
Second

CPU busy schedule:

Non-preemptive event-based scheduler

It is clear that the example described above is too primitive. Let's also introduce the ability to activate a specific task. To do this, add a flag to the task description structure that indicates whether the task is active or not. Of course, still need a small API to control the activation.

 #include <stdio.h> #define TASK_COUNT 2 struct task { void (*func)(void *); void *data; int activated; }; static struct task tasks[TASK_COUNT]; struct task_data { char *str; struct task *next_task; }; static struct task *task_create(void (*func)(void *), void *data) { static int i = 0; tasks[i].func = func; tasks[i].data = data; return &tasks[i++]; } static int task_activate(struct task *task, void *data) { task->data = data; task->activated = 1; return 0; } static int task_run(struct task *task, void *data) { task->activated = 0; task->func(data); return 0; } static void scheduler(void) { int i; int fl = 1; while (fl) { fl = 0; for (i = 0; i < TASK_COUNT; i++) { if (tasks[i].activated) { fl = 1; task_run(&tasks[i], tasks[i].data); } } } } static void worker1(void *data) { printf("%s\n", (char *) data); } static void worker2(void *data) { struct task_data *task_data; task_data = data; printf("%s\n", task_data->str); task_activate(task_data->next_task, "First activated"); } int main(void) { struct task *t1, *t2; struct task_data task_data; t1 = task_create(&worker1, "First create"); t2 = task_create(&worker2, "Second create"); task_data.next_task = t1; task_data.str = "Second activated"; task_activate(t2, &task_data); scheduler(); return 0; }

Output Results:

Second activated
First activated

CPU Busy Graph

Non-preemptive Message Queuing Scheduler

The problems of the previous method are obvious: if someone wants to activate a task twice, until the task is processed, then it will fail. Information about the second activation is simply lost. This problem can be partially resolved with a message queue. Instead of flags, we add an array in which message queues for each stream are stored.

 #include <stdio.h> #include <stdlib.h> #define TASK_COUNT 2 struct message { void *data; struct message *next; }; struct task { void (*func)(void *); struct message *first; }; struct task_data { char *str; struct task *next_task; }; static struct task tasks[TASK_COUNT]; static struct task *task_create(void (*func)(void *), void *data) { static int i = 0; tasks[i].func = func; tasks[i].first = NULL; return &tasks[i++]; } static int task_activate(struct task *task, void *data) { struct message *msg; msg = malloc(sizeof(struct message)); msg->data = data; msg->next = task->first; task->first = msg; return 0; } static int task_run(struct task *task, void *data) { struct message *msg = data; task->first = msg->next; task->func(msg->data); free(data); return 0; } static void scheduler(void) { int i; int fl = 1; struct message *msg; while (fl) { fl = 0; for (i = 0; i < TASK_COUNT; i++) { while (tasks[i].first) { fl = 1; msg = tasks[i].first; task_run(&tasks[i], msg); } } } } static void worker1(void *data) { printf("%s\n", (char *) data); } static void worker2(void *data) { struct task_data *task_data; task_data = data; printf("%s\n", task_data->str); task_activate(task_data->next_task, "Message 1 to first"); task_activate(task_data->next_task, "Message 2 to first"); } int main(void) { struct task *t1, *t2; struct task_data task_data; t1 = task_create(&worker1, "First create"); t2 = task_create(&worker2, "Second create"); task_data.next_task = t1; task_data.str = "Second activated"; task_activate(t2, &task_data); scheduler(); return 0; }

Results of work:

Second activated
Message 2 to first
Message 1 to first

CPU Busy Graph

Non-displacing scheduler with call order preserved

Another problem with the previous examples is that the order of activation of tasks is not preserved. In fact, each task has its own priority, which is not always good. To solve this problem, you can create one message queue and a dispatcher that will parse it.

 #include <stdio.h> #include <stdlib.h> #define TASK_COUNT 2 struct task { void (*func)(void *); void *data; struct task *next; }; static struct task *first = NULL, *last = NULL; static struct task *task_create(void (*func)(void *), void *data) { struct task *task; task = malloc(sizeof(struct task)); task->func = func; task->data = data; task->next = NULL; if (last) { last->next = task; } else { first = task; } last = task; return task; } static int task_run(struct task *task, void *data) { task->func(data); free(task); return 0; } static struct task *task_get_next(void) { struct task *task = first; if (!first) { return task; } first = first->next; if (first == NULL) { last = NULL; } return task; } static void scheduler(void) { struct task *task; while ((task = task_get_next())) { task_run(task, task->data); } } static void worker2(void *data) { printf("%s\n", (char *) data); } static void worker1(void *data) { printf("%s\n", (char *) data); task_create(worker2, "Second create"); task_create(worker2, "Second create again"); } int main(void) { struct task *t1; t1 = task_create(&worker1, "First create"); scheduler(); return 0; }

Results of work:

First create
Second create
Second create again

CPU Busy Graph

Before turning to the displacing scheduler, I want to add that the non-displacing scheduler is used in real systems, since the cost of switching tasks is minimal. However, this approach requires a lot of attention from the programmer, he must independently ensure that tasks do not get stuck in the execution time.

Displacing scheduler

Now let's imagine the following picture. We have two computational threads that execute the same program, and there is a scheduler, which at an arbitrary time, before executing any instruction, can interrupt the active flow and activate the other. To manage such tasks, there is not enough information only about the stream call function and its parameters, as is the case with non-displacing schedulers. At a minimum, you still need to know the address of the current instruction being executed and the set of local variables for each task. That is, for each task, you need to store copies of the corresponding variables, and since local variables for threads are located on its stack, space must be allocated for the stack of each thread, and a pointer to the current position of the stack must be stored somewhere.

This data - instruction pointer and stack pointer - is stored in the registers of the processor. In addition to them, other information contained in the registers is also necessary for correct operation: status flags, various general-purpose registers containing temporary variables, and so on. All this is called processor context.

Processor context

Processor context (CPU context) is a data structure that stores the internal state of the processor registers. The context must allow the processor to return to the correct state in order to execute the computational flow. The process of replacing one computational flow with another is usually called a context switch.

Description of the context structure for the x86 architecture from our project:

 struct context { /* 0x00 */uint32_t eip; /**< instruction pointer */ /* 0x04 */uint32_t ebx; /**< base register */ /* 0x08 */uint32_t edi; /**< Destination index register */ /* 0x0c */uint32_t esi; /**< Source index register */ /* 0x10 */uint32_t ebp; /**< Stack pointer register */ /* 0x14 */uint32_t esp; /**< Stack Base pointer register */ /* 0x18 */uint32_t eflags; /**< EFLAGS register hold the state of the processor */ };

The concepts of processor context and context switching are fundamental to the concept of preemptive scheduling.

Context switch

Context switch - replacing the context of one thread with another. The scheduler saves the current context and loads another into the processor registers.
Above, I said that the scheduler can interrupt the active thread at any time, which somewhat simplifies the model. In fact, it is not the scheduler that interrupts the thread, but the current program is interrupted by the processor as a result of a reaction to an external event — a hardware interrupt — and transfers control to the scheduler. For example, an external event is a system timer, which counts the time slot allocated for the active flow. If we assume that there is exactly one interrupt source in the system, the system timer, then the CPU time card will look like this:

The context switch procedure for the x86 architecture:

  .global context_switch context_switch: movl 0x04(%esp), %ecx /* Point ecx to previous registers */ movl (%esp), %eax /* Get return address */ movl %eax, CTX_X86_EIP(%ecx) /* Save it as eip */ movl %ebx, CTX_X86_EBX(%ecx) /* Save ebx */ movl %edi, CTX_X86_EDI(%ecx) /* Save edi */ movl %esi, CTX_X86_ESI(%ecx) /* Save esi */ movl %ebp, CTX_X86_EBP(%ecx) /* Save ebp */ add $4, %esp /* Move esp in state corresponding to eip */ movl %esp, CTX_X86_ESP(%ecx) /* Save esp */ pushf /* Push flags */ pop CTX_X86_EFLAGS(%ecx) /* ...and save them */ movl 0x04(%esp), %ecx /* Point ecx to next registers */ movl CTX_X86_EBX(%ecx), %ebx /* Restore ebx */ movl CTX_X86_EDI(%ecx), %edi /* Restore edi */ movl CTX_X86_ESP(%ecx), %esi /* Restore esp */ movl CTX_X86_EBP(%ecx), %ebp /* Restore ebp */ movl CTX_X86_ESP(%ecx), %esp /* Restore esp */ push CTX_X86_EFLAGS(%ecx) /* Push saved flags */ popf /* Restore flags */ movl CTX_X86_EIP(%ecx), %eax /* Get eip */ push %eax /* Restore it as return address */ ret

Flow machine

We discussed the important difference in the flow structure in the case of the displacing scheduler from the case of the non-preemptive scheduler - the presence of context. Let's see what happens to the stream from its inception to its completion:

The init state is responsible for creating the thread, but not adding it to the scheduler yet, and exit says that the thread has completed its execution, but has not yet released the memory allocated to it.
The run state should also be obvious - the thread in this state is executed on the processor.
The ready state indicates that the thread is not being executed, but is waiting for the scheduler to give it time, that is, it is in the scheduler queue.

But this does not exhaust the possible flow conditions. The flow can give a quantum of time in anticipation of an event, for example, just fall asleep for a while and after that time continue execution from the place where it fell asleep (for example, the usual call to sleep).
Thus, the flow can be in different states (ready for execution, terminating, in standby mode, and so on), whereas in the case of a non-preemptive scheduler, it was enough to have a flag about the activity of a particular task.

This is how you can imagine a generalized state machine:

In this scheme, a new wait state has appeared, which tells the scheduler that the thread has fallen asleep, and until it wakes up, it does not need to allocate processor time to it.
Now let's take a closer look at the flow control API, as well as deepen our knowledge of flow conditions.

State implementation

If you look at the state diagram more closely, you can see that the init and wait states are almost the same: both can go to the ready state, that is, tell the scheduler that they are ready to receive their time slot. Thus, the init state is redundant.

Now look at the exit status. This state has its own subtleties. It is set to the scheduler in the final function, it will be discussed below. Completion of a flow can take place in two scenarios: the first - the flow completes its main function and frees the resources it occupies, the second - the other thread takes responsibility for freeing resources. In the second case, the thread sees that another thread will release its resources, informs it that it has completed, and transfers control to the scheduler. In the first case, the thread frees resources and also transfers control to the scheduler. After the scheduler has received control, the flow should never resume work. That is, in both cases, the exit state has the same value - the thread in this state does not want to receive a new time slot, it does not need to be placed in the scheduler queue. Conceptually, it is also no different from the wait state, so you can not start a separate state.

Thus, we have three states. We will store these states in three separate fields. It would be possible to store everything in a single integer field, but to simplify the checks and by virtue of the peculiarity of the multiprocessor case, which we will not discuss here, such a decision was made. So, the flow conditions:

active - started and executed on the processor
waiting - waiting for some event. Additionally, it replaces the init and exit states
ready - is managed by the scheduler, i.e. lies in the queue of ready threads in the scheduler or running on the processor. This state is somewhat wider than the ready we see in the picture. In most cases, active and ready, and ready and waiting are theoretically orthogonal, but there are specific transition states where these rules are violated. I will tell about these cases below.

Creature

Creating a thread includes the necessary initialization (the thread_init function) and the possible starting of the thread. During initialization, memory is allocated for the stack, the processor context is set, the necessary flags and other initial values are set. Since during the creation we work with a queue of ready threads, which the scheduler uses at an arbitrary time, we need to block the work of the scheduler with the flow structure until the entire structure is initialized completely. After initialization, the thread is in the waiting state, which, as we remember, is also responsible for the initial state. After that, depending on the passed parameters, either we start the stream or not. The start-up function is a start-up / wake-up function in the scheduler, it is described in detail below. For now, we only say that this function places the thread in the scheduler queue and changes the waiting state to ready.
So, the code of the thread_create and thread_init functions:

 struct thread *thread_create(unsigned int flags, void *(*run)(void *), void *arg) { int ret; struct thread *t; //… /* below we are going work with thread instances and therefore we need to * lock the scheduler (disable scheduling) to prevent the structure being * corrupted */ sched_lock(); { /* allocate memory */ if (!(t = thread_alloc())) { t = err_ptr(ENOMEM); goto out; } /* initialize internal thread structure */ thread_init(t, flags, run, arg); //… } out: sched_unlock(); return t; }

 void thread_init(struct thread *t, unsigned int flags, void *(*run)(void *), void *arg) { sched_priority_t priority; assert(t); assert(run); assert(thread_stack_get(t)); assert(thread_stack_get_size(t)); t->id = id_counter++; /* setup thread ID */ dlist_init(&t->thread_link); /* default unlink value */ t->critical_count = __CRITICAL_COUNT(CRITICAL_SCHED_LOCK); t->siglock = 0; t->lock = SPIN_UNLOCKED; t->ready = false; t->active = false; t->waiting = true; t->state = TS_INIT; /* set executive function and arguments pointer */ t->run = run; t->run_arg = arg; t->joining = NULL; //... /* cpu context init */ context_init(&t->context, true); /* setup default value of CPU registers */ context_set_entry(&t->context, thread_trampoline);/*set entry (IP register*/ /* setup stack pointer to the top of allocated memory * The structure of kernel thread stack follow: * +++++++++++++++ top * | * v * the thread structure * xxxxxxx * the end * +++++++++++++++ bottom (t->stack - allocated memory for the stack) */ context_set_stack(&t->context, thread_stack_get(t) + thread_stack_get_size(t)); sigstate_init(&t->sigstate); /* Initializes scheduler strategy data of the thread */ runq_item_init(&t->sched_attr.runq_link); sched_affinity_init(t); sched_timing_init(t); }

Standby mode

A thread can give its time to another thread for some reason, for example, by calling the sleep function. That is, the current thread goes from operation to standby mode. If in the case of a non-preemptive scheduler, we simply set the activity flag, here we will save our stream in another queue. The waiting thread is not put in the scheduler queue. In order not to lose the flow, it is usually stored in a special queue. For example, when trying to capture a busy mutex stream, before falling asleep, it puts itself in a queue of waiting mutex threads. And when an event occurs that expects a thread, for example, the release of a mutex, it will wake it up and we will be able to return the flow back to the ready queue. More details about waiting and pitfalls will be described below, after we deal with the code of the scheduler itself.

End flow

Here the thread is in the wait state. If the thread performed the processing function and ended naturally, you need to free up resources. I have already described this process in detail when I spoke about the redundancy of the exit state. Let us now look at the implementation of this function.

 void __attribute__((noreturn)) thread_exit(void *ret) { struct thread *current = thread_self(); struct task *task = task_self(); struct thread *joining; /* We can not free the main thread */ if (task->main_thread == current) { /* We are last thread. */ task_exit(ret); /* NOTREACHED */ } sched_lock(); current->waiting = true; current->state |= TS_EXITED; /* Wake up a joining thread (if any). * Note that joining and run_ret are both in a union. */ joining = current->joining; if (joining) { current->run_ret = ret; sched_wakeup(joining); } if (current->state & TS_DETACHED) /* No one references this thread anymore. Time to delete it. */ thread_delete(current); schedule(); /* NOTREACHED */ sched_unlock(); /* just to be honest */ panic("Returning from thread_exit()"); }

Springboard to call processing function

We have said more than once that when a thread finishes execution, it must release resources. You don’t want to call the thread_exit function yourself - it is very rarely necessary to terminate the thread in an exceptional order, rather than naturally, after performing its function. In addition, we need to prepare the initial context, which is also unnecessary to do every time. Therefore, the thread starts not with the function that we specified during the creation, but with the thread_trampoline wrapper function. It just serves to prepare the initial context and correctly terminate the flow.

 static void __attribute__((noreturn)) thread_trampoline(void) { struct thread *current = thread_self(); void *res; assert(!critical_allows(CRITICAL_SCHED_LOCK), "0x%x", (uint32_t)__critical_count); sched_ack_switched(); assert(!critical_inside(CRITICAL_SCHED_LOCK)); /* execute user function handler */ res = current->run(current->run_arg); thread_exit(res); /* NOTREACHED */ }

Summary: stream structure description

So, to describe the problem in the case of the displacing scheduler, we need a rather complicated structure. It contains:

information about processor registers (context).
information on the status of the task, whether it is ready for execution or, for example, waiting for the release of a resource.
identifier. In the case of an array, this is an index in the array, but if streams can be added and deleted, it is better to use a queue where identifiers will be useful.
the start function and its arguments, perhaps even the return result.
the address of the piece of memory that is allocated for the stack, because when it exits the stream, it must be freed.

Correspondingly, the description of the structure in our project is as follows:

 struct thread { unsigned int critical_count; unsigned int siglock; spinlock_t lock; /**< Protects wait state and others. */ unsigned int active; /**< Running on a CPU. TODO SMP-only. */ unsigned int ready; /**< Managed by the scheduler. */ unsigned int waiting; /**< Waiting for an event. */ unsigned int state; /**< Thread-specific state. */ struct context context; /**< Architecture-dependent CPU state. */ void *(*run)(void *); /**< Start routine. */ void *run_arg; /**< Argument to pass to start routine. */ union { void *run_ret; /**< Return value of the routine. */ void *joining; /**< A joining thread (if any). */ } /* unnamed */; thread_stack_t stack; /**< Handler for work with thread stack */ __thread_id_t id; /**< Unique identifier. */ struct task *task; /**< Task belong to. */ struct dlist_head thread_link; /**< list's link holding task threads. */ struct sigstate sigstate; /**< Pending signal(s). */ struct sched_attr sched_attr; /**< Scheduler-private data. */ thread_local_t local; thread_cancel_t cleanups; };

The structure has fields that are not described in the article (sigstate, local, cleanups); they are needed to support full-fledged POSIX threads (pthread) and are not fundamental in this article.

Scheduler and scheduling strategy

Recall that now we have a flow structure, including the context, we are able to switch this context. In addition, we have a system timer that measures time slices. In other words, we have an environment for the work of the scheduler.
The task of the scheduler is to distribute the processor time between threads. The scheduler has a queue of ready threads, which it operates to determine the next active thread. The rules by which the scheduler selects the next thread for execution will be called the planning strategy. The main function of the planning strategy is to work with the queue of ready threads: add, delete and extract the next ready stream. How these functions are implemented will determine the behavior of the scheduler.Since we were able to define a separate concept - a planning strategy, we will move it into a separate entity. We described the interface as follows:

 extern void runq_init(runq_t *queue); extern void runq_insert(runq_t *queue, struct thread *thread); extern void runq_remove(runq_t *queue, struct thread *thread); extern struct thread *runq_extract(runq_t *queue); extern void runq_item_init(runq_item_t *runq_link);

Consider the implementation of a planning strategy in more detail.

Sample planning strategy

As an example, I will analyze the most primitive planning strategy in order to focus not on the intricacies of the strategy, but on the features of the displacing scheduler. The flows in this strategy will be processed in order of priority without regard to priority: the new flow and just completed its quantum will be placed at the end; the thread that gets the CPU resources will be inherited from the beginning.
The queue will be a regular doubly linked list. When we add an element, we insert it at the end, and when we get it, we take it and remove it from the beginning.

 void runq_item_init(runq_item_t *runq_link) { dlist_head_init(runq_link); } void runq_init(runq_t *queue) { dlist_init(queue); } void runq_insert(runq_t *queue, struct thread *thread) { dlist_add_prev(&thread->sched_attr.runq_link, queue); } void runq_remove(runq_t *queue, struct thread *thread) { dlist_del(&thread->sched_attr.runq_link); } struct thread *runq_extract(runq_t *queue) { struct thread *thread; thread = dlist_entry(queue->next, struct thread, sched_attr.runq_link); runq_remove(queue, thread); return thread; }

Scheduler

Now we come to the most interesting - the description of the scheduler.

Run Scheduler

The first stage of the scheduler is its initialization. Here we need to provide the correct environment to the scheduler. You need to prepare a queue of ready-made threads, add an idle stream to this queue, and start a timer that will be used to count time slots for the execution of threads.
Scheduler startup code:

 int sched_init(struct thread *idle, struct thread *current) { runq_init(&rq.queue); rq.lock = SPIN_UNLOCKED; sched_wakeup(idle); sched_ticker_init(); return 0; }

Awakening and starting a thread

As we remember from the description of the state machine, waking up and starting a thread for the scheduler is one and the same process. The call to this function is in the launch of the scheduler, where we start the idle stream. What actually happens when you wake up? First, the note that we are waiting is removed, that is, the thread is no longer in the waiting state. Then there are two options: have we managed to fall asleep or not yet. Why this happens, I will describe in the next section “Waiting”. If you do not have time, then the thread is still in the ready state, and in this case, the awakening is completed. Otherwise, we put the stream in the scheduler queue, remove the waiting status, put ready. In addition, rescheduling is called if the priority of the awakened thread is greater than the current one. Pay attention to various locks: the whole action occurs when interrupts are disabled.In order to see how the awakening and launch of the thread occurs in the case of SMP, I advise you to refer to the project code.

 /** Locks: IPL, thread. */ static int __sched_wakeup_ready(struct thread *t) { int ready; spin_protected_if (&rq.lock, (ready = t->ready)) t->waiting = false; return ready; } /** Locks: IPL, thread. */ static void __sched_wakeup_waiting(struct thread *t) { assert(t && t->waiting); spin_lock(&rq.lock); __sched_enqueue_set_ready(t); __sched_wokenup_clear_waiting(t); spin_unlock(&rq.lock); } static inline void __sched_wakeup_smp_inactive(struct thread *t) { __sched_wakeup_waiting(t); } /** Called with IRQs off and thread lock held. */ int __sched_wakeup(struct thread *t) { int was_waiting = (t->waiting && t->waiting != TW_SMP_WAKING); if (was_waiting) if (!__sched_wakeup_ready(t)) __sched_wakeup_smp_inactive(t); return was_waiting; } int sched_wakeup(struct thread *t) { assert(t); return SPIN_IPL_PROTECTED_DO(&t->lock, __sched_wakeup(t)); }

Expectation

The transition to the standby mode and the correct exit from it (when the expected event finally happens) is probably the most difficult and delicate thing in the preemptive planning. Let's look at the situation in more detail.
First of all, we have to explain to the scheduler that we want to wait for an event, and the event is naturally asynchronous, and we need to get it synchronously. Therefore, we need to specify how the scheduler determines that an event has occurred. At the same time, we do not know when it can happen, for example, we managed to tell the planner that we were waiting for an event, checked that the conditions for its occurrence had not yet been met, and at that moment a hardware interrupt occurs, which our event generates. But since we have already completed the test, this information will be lost. In our project, we solved this problem as follows.
Waiting Macro Code

 #define SCHED_WAIT_TIMEOUT(cond_expr, timeout) \ ((cond_expr) ? 0 : ({ \ int __wait_ret = 0; \ clock_t __wait_timeout = timeout == SCHED_TIMEOUT_INFINITE ? \ SCHED_TIMEOUT_INFINITE : ms2jiffies(timeout); \ \ threadsig_lock(); \ do { \ sched_wait_prepare(); \ \ if (cond_expr) \ break; \ \ __wait_ret = sched_wait_timeout(__wait_timeout, \ &__wait_timeout); \ } while (!__wait_ret); \ \ sched_wait_cleanup(); \ \ threadsig_unlock(); \ __wait_ret; \ }))

The flow can be immediately in the superposition of states. That is, when the thread falls asleep, it is still active and only sets an additional waiting flag. During the awakening again this flag is simply removed, and only if the thread has already managed to reach the scheduler and leave the queue of ready streams, will it be added there again. If we consider the situation described earlier in the picture, we get the following picture.

A - active
R - ready
W - wait

In the picture, the letters indicate the presence of states. Light green is the state of the thread before the wait_prepare, green is after the wait_prepare, and dark green is the call for the replanning.
If the event does not have time to occur before rescheduling, then everything is simple - the stream will fall asleep and will await awakening:

Rescheduling

The main task of the scheduler is planning, I apologize for the tautology. And we finally came to the moment when you can disassemble how this process is implemented in our project.
First of all, rescheduling should be done with the scheduler locked. Secondly, we want to give the opportunity to allow the displacement of the stream or not. Therefore, we carried the logic of the work into a separate function surrounded its call with locks and called it indicating that in this place we do not allow repression.
Next come the actions with the queue of ready threads. If the thread that is active at the time of rescheduling is not going to fall asleep, that is, if it does not have the waiting status, we will simply add it to the scheduler's thread queue. Then we retrieve the highest priority thread from the queue. The rules for finding this stream are implemented using a scheduling strategy.
Then, if the current active thread is the same as the one we got from the queue, we do not need rescheduling and we can simply exit and continue the flow. In the case if rescheduling is required, the function sched_switch is called, in which the actions necessary for the scheduler are performed and the main thing called the context_switch which we considered above.
If the thread is going to fall asleep, is in the waiting state, then it does not fall into the scheduler queue, and the ready tag is removed from it.
In the end, signal processing occurs, but as I noted above, this is beyond the scope of this article.

 static void sched_switch(struct thread *prev, struct thread *next) { sched_prepare_switch(prev, next); trace_point(__func__); /* Preserve initial semantics of prev/next. */ cpudata_var(saved_prev) = prev; thread_set_current(next); context_switch(&prev->context, &next->context); /* implies cc barrier */ prev = cpudata_var(saved_prev); sched_finish_switch(prev); } static void __schedule(int preempt) { struct thread *prev, *next; ipl_t ipl; prev = thread_self(); assert(!sched_in_interrupt()); ipl = spin_lock_ipl(&rq.lock); if (!preempt && prev->waiting) prev->ready = false; else __sched_enqueue(prev); next = runq_extract(&rq.queue); spin_unlock(&rq.lock); if (prev != next) sched_switch(prev, next); ipl_restore(ipl); assert(thread_self() == prev); if (!prev->siglock) { thread_signal_handle(); } } void schedule(void) { sched_lock(); __schedule(0); sched_unlock(); }

Verifying multithreading

As an example, I used the following code:

 #include <stdint.h> #include <errno.h> #include <stdio.h> #include <util/array.h> #include <kernel/thread.h> #include <framework/example/self.h> /** * This macro is used to register this example at the system. */ EMBOX_EXAMPLE(run); /* configs */ #define CONF_THREADS_QUANTITY 0x8 /* number of executing threads */ #define CONF_HANDLER_REPEAT_NUMBER 300 /* number of circle loop repeats*/ /** The thread handler function. It's used for each started thread */ static void *thread_handler(void *args) { int i; /* print a thread structure address and a thread's ID */ for(i = 0; i < CONF_HANDLER_REPEAT_NUMBER; i ++) { printf("%d", *(int *)args); } return thread_self(); } /** * Example's executing routine * It has been declared by the macro EMBOX_EXAMPLE */ static int run(int argc, char **argv) { struct thread *thr[CONF_THREADS_QUANTITY]; int data[CONF_THREADS_QUANTITY]; void *ret; int i; /* starting all threads */ for(i = 0; i < ARRAY_SIZE(thr); i ++) { data[i] = i; thr[i] = thread_create(0, thread_handler, &data[i]); } /* waiting until all threads finish and print return value*/ for(i = 0; i < ARRAY_SIZE(thr); i ++) { thread_join(thr[i], &ret); } printf("\n"); return ENOERR; }

Actually, this is almost a normal application. The EMBOX_EXAMPLE (run) macro sets the entry point to the specific type of our modules. The thread_join function waits until the thread is complete, until I also considered it. And so it happened a lot for one article.
The result of running this example on qemu as part of our project is the following.

As can be seen from the results, the first threads created are executed one by one, the scheduler gives them time in turn. At the end of some discrepancy. I think this is a consequence of the fact that the scheduler has a rather rough accuracy (not comparable with the output of one character on the screen). Therefore, in the first passes, the threads manage to perform a different number of cycles.
In general, who wants to play, you can download the projectand try all of the above in practice.
If the topic is interesting, I will try to continue the story about planning, there are still quite a few topics left open.

Source: https://habr.com/ru/post/219431/

All Articles

Organization of multitasking in the OS kernel

Introduction

Non-displacing scheduler

The simplest non-displacing scheduler

Output Results:

CPU busy schedule:

Non-preemptive event-based scheduler

Output Results:

CPU Busy Graph

Non-preemptive Message Queuing Scheduler

Results of work:

CPU Busy Graph

Non-displacing scheduler with call order preserved

Results of work:

CPU Busy Graph

Displacing scheduler

Processor context

Context switch

Flow machine

State implementation

Creature

Standby mode

End flow

Springboard to call processing function

Summary: stream structure description

Scheduler and scheduling strategy

Sample planning strategy

Scheduler

Run Scheduler

Awakening and starting a thread

Expectation

Rescheduling

Verifying multithreading

More articles: