What are threads?

Inspired by the previous article on this topic .
In order to structure your understanding of what threads are (this word is translated into Russian as “threads” almost everywhere except books on the Win32 API, where it is translated as “threads”) and how they differ from processes, you can use the following two definitions:

Thread is a virtual processor having its own set of registers, similar to the registers of a real central processor. One of the most important registers for a virtual processor, like for a real one, is an individual pointer to the current instruction (for example, an individual EIP register on x86 processors),
The process is primarily an address space . In modern architecture, created by the OS kernel through the manipulation of page tables. And secondarily, the process should be looked at as an anchor point for “resources” in the OS. If we analyze such an aspect as multitasking in order to understand the essence of threads, then at this moment we don’t need to think about OS “resources” such as files and what they are tied to.

It is very important to understand that thread is conceptually a virtual processor and when we write the implementation of threads in the OS kernel or in the user-level library, we solve the problem of “reproduction” of the central processor in many virtual instances that are logically or even physically (on SMP, SMT and multi-core CPU platforms) work in parallel with each other.
At the basic, conceptual level, there is no “context”. Context is simply the name of the data structure into which the OS kernel or our library (implementing threads) stores the registers of the virtual processor when it switches between them, emulating their parallel operation. Context switching is a way to implement threads , and not a more fundamental concept through which you need to determine thread.
When approaching the definition of the concept of thread through analyzing the API of specific operating systems, too many entities are usually introduced - here you will have processes, address spaces, contexts, switching of these contexts, timer interruptions, time slices with priorities, and even “resources” , attached to processes (as opposed to threads). And all this is woven into one tangle, and often we see that we go in a circle, reading the definitions. Alas, this is a common way to explain the essence of threads in books, but this approach greatly confuses novice programmers and ties their understanding to the specifics of implementation.
It is clear that all these terms have a right to exist and did not arise by chance, behind each of them there is some important essence. But among them it is necessary to highlight the main and secondary (introduced to implement the main entities or hung on top of them, already at the next levels of abstraction).
The main idea of thread is the virtualization of the CPU registers - emulation of several logical processors on one physical processor, each of which has its own state of registers (including the instruction pointer) and works in parallel with the others.
The main feature of the process in the context of this conversation is that it has its own page tables that form its individual address space . The process is not in itself executable.
We can say in the definition that “every process in the system always has at least one thread”. Otherwise, address space is logically devoid of sense for the user if it is not visible at least to one virtual processor (thread). Therefore, it is logical that all modern OSs destroy the address space (complete the process) when the last thread working on this address space is completed. And you can not say in the definition of the process that it has "at least one thread." Moreover, at the lower system level a process (as a rule) can exist as an OS object even without having threads in it.
If you look at the sources, for example, the Windows kernel, then you will see that the address space and other process structures are constructed before the initial thread is created in it (the initial thread for this process). In fact, initially there are no threads at all in the process. In Windows, you can even create thread in a foreign address space through the user-level API ...
If you look at thread as a virtual processor, then its binding to the address space is the loading into the virtual register of the base of stanichny tables of the desired value. :) Moreover, at the lower level, this is exactly what happens - every time you switch to thread associated with another process, the OS kernel reloads the pointer register to page tables (on those processors that do not support working with many spaces at the hardware level ).

Source: https://habr.com/ru/post/40275/

All Articles

What are threads?

More articles: