Pthreads: POSIX Threads

Modern operating systems and microprocessors have long supported multitasking, and at the same time, each of these tasks can be performed in several threads. This gives a tangible increase in computing performance and allows for better scaling of user applications and the server, but the price has to be paid for this - the development of the program and its debugging are complicated.

In this article, we will introduce POSIX Threads in order to learn how it all works in Linux. Without going into the wilds of synchronization and signals, consider the main elements of Pthreads. So, under the hood flows.

General information

Multiple execution threads in a single process are called threads, and this is the basic unit of CPU utilization, consisting of a thread identifier, a counter, registers, and a stack. Threads within one process divide sections of code, data, and various resources: open file descriptors, process credentials, signals, umask , nice values, timers, and so on.

All executable processes have at least one execution thread. Some processes are limited to this in those cases where additional execution threads do not give a performance boost, but only complicate the program. However, such programs every day becomes relatively less.

What is the use of multiple execution threads? Let's take some loaded web server, for example habrahabr.ru. If the server created a separate process to service each http request, we would wait forever for our page to load. Creating a new process is an expensive pleasure for the OS. Even taking into account the optimization by copying while writing , the fork and exec system calls create new copies of the memory pages and the list of file descriptors. In general, the OS kernel can create a new thread an order of magnitude faster than a new process.

The kernel uses copy-on-write for data pages, memory segments of the parent process containing a stack and a heap. Due to the fact that processes often call fork and immediately after exec , copying their pages during the execution of a fork call becomes unnecessary waste - they still have to be discarded after exec . First, the page table entries point to the same pages of the physical memory of the parent process, the pages themselves are marked read-only . Page copying occurs exactly at the moment when you want to change it .

Page tables before and after changing the total memory page during copying while recording.

There is a pattern between the number of parallel threads of the process execution, the program's algorithm and the increase in productivity. This dependence is called Amdahl's Law .

Amdahl's law for parallelization of processes.

Using the equation shown in the figure, you can calculate the maximum performance improvement of a system using N processors and the factor F, which indicates how much of the system can not be parallelized. For example, 75% of the code runs in parallel, and 25% - in series. In this case, a 1.6-fold program acceleration will be achieved on a dual-core processor, 2.28571 multiples on a quad-core processor, and the limit value of acceleration with N tending to infinity is 4.

Mapping threads to kernel mode

Virtually all modern operating systems — including Windows, Linux, Mac OS X, and Solaris — support kernel-mode threading. However, threads can be created not only in kernel mode, but also in user mode. When using this level, the kernel does not know about the existence of threads - all thread management is implemented by the application using special libraries. User threads are displayed differently on threads in kernel mode. In total there are three models, of which 1: 1 is the most frequently used.

Display N: 1

In this model, several user threads are mapped to one OS kernel thread. All thread management is performed by a special user library, and this is the advantage of this approach. The disadvantage is that if a single thread performs a blocking call, then the whole process is inhibited. Previous versions of Solaris OS used such a model, but then had to abandon it.

Display 1: 1

This is the simplest model in which each thread created in a process is directly controlled by the OS kernel scheduler and mapped to one single thread in kernel mode. So that the application does not produce uncontrolled flows, overloading the OS, impose a limit on the maximum number of threads supported in the OS. This way of displaying threads is supported by Linux and Windows.

M Display: N

With this approach, M user streams are multiplexed into the same or smaller N number of kernel threads. The negative effects of the two other models are overcome: the threads are really executed in parallel and there is no need for the OS to impose restrictions on their total number. However, this model is quite difficult to implement in terms of programming.

POSIX threads

In the late 1980s and early 1990s there were several different APIs, but in 1995 POSIX.1c standardized POSIX threads, later it became part of the SUSv3 specifications . Nowadays, multi-core processors have even penetrated desktops and smartphones, so most machines have low-level hardware support, allowing them to simultaneously run multiple threads. In the old days, the simultaneous execution of streams on single-core CPUs was only an impressively inventive, but very effective illusion.

Pthreads defines a set of types and functions in C.

pthread_t is a thread identifier;
pthread_mutex_t - mutex;
pthread_mutexattr_t - mutex attribute object
pthread_cond_t - conditional variable
pthread_condattr_t - conditional attribute object;
pthread_key_t - data specific to the thread;
pthread_once_t - context control of dynamic initialization;
pthread_attr_t - list of stream attributes.

In the traditional Unix API, the errno last error code is a global int variable. This is however not suitable for programs with multiple threads of execution. In a situation where a function call in one of the executable threads ended in an error in the global variable errno , a race condition may occur due to the fact that other threads can check the error code at the moment and get embarrassed. In Unix and Linux, this problem was circumvented by the fact that errno is defined as a macro that sets its own variable lvalue for each thread.

From man errno
The variable errno is defined in the ISO C standard as changeable lvalue int and not explicitly declared; errno may be a macro. The variable errno is the local value of the thread; changing it in one thread does not affect its value in another thread.

Stream creation

First, a stream function is created. Then a new thread is created by the pthread_create() function declared in the pthread.h header file. Further, the caller continues to perform some of its actions in parallel to the stream function.

 #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start)(void *), void *arg);

Upon successful completion, pthread_create() returns 0, a non-zero value signals an error.

The first parameter of the pthread_create() call is the address for storing the identifier of the pthread_t stream being created.
The start argument is a pointer to a thread void * function, which accepts a typeless pointer as the only variable.
The arg argument is a typeless pointer containing stream arguments. Most often, arg points to a global or dynamic variable, but if the called function does not require arguments, then you can specify NULL as arg .
The attr argument is also a typeless pointer to the pthread_attr_t stream attribute. If this argument is NULL , then the stream is created with default attributes.

Consider now an example of a multithreaded program.

 #include <pthread.h> #include <stdio.h> int count; /*     */ int atoi(const char *nptr); void *potok(void *param); /*   */ int main(int argc, char *argv[]) { pthread_t tid; /*   */ pthread_attr_t attr; /*   */ if (argc != 2) { fprintf(stderr,"usage: progtest <integer value>\n"); return -1; } if (atoi(argv[1]) < 0) { fprintf(stderr," %d     \n",atoi(argv[1])); return -1; } /*     */ pthread_attr_init(&attr); /*    */ pthread_create(&tid,&attr,potok,argv[1]); /*     */ pthread_join(tid,NULL); printf("count = %d\n",count); } /*     */ void *potok(void *param) { int i, upper = atoi(param); count = 0; if (upper > 0) { for (i = 1; i <= upper; i++) count += i; } pthread_exit(0); }

To connect the Pthread library to a program, you need to pass the -lpthread option to the -lpthread .

 gcc -o progtest -std=c99 -lpthread progtest.c

I’ll talk about the accession of the pthread_join stream a bit later. The string pthread_t tid specifies the thread id. The function attributes are given by pthread_attr_init(&attr) . Since we did not specify them explicitly, the default values will be used.

End flow

The thread completes the task when:

the stream function executes return and returns the result of the performed calculations;
as a result of the call to end the execution of the pthread_exit() thread;
as a result of calling the thread cancel pthread_cancel() ;
one of the threads makes an exit() call
the main thread in the main() function performs a return , in which case all the threads of the process are abruptly collapsed.

Syntax is simpler than creating a stream.

 #include <pthread.h> void pthread_exit(void *retval);

If in the latter version the main() thread from the function main() executes pthread_exit() instead of just exit() or return , then the remaining threads will continue to be executed, as if nothing had happened.

Waiting for flow

The pthread_join() function waits for the end of the thread denoted by THREAD_ID . If this thread was already completed by that time, then the function immediately returns a value. The meaning of the function is to synchronize threads. It is declared in pthread.h as follows:

 #include <pthread.h> int pthread_join (pthread_t THREAD_ID, void ** DATA);

Upon successful completion, pthread_join() returns a code of 0, a non-zero value signals an error.

If the DATA pointer is different from NULL , then the data returned by the stream through the pthread_exit() function or via the return instruction of the stream function is placed there. Multiple threads cannot wait for one to complete. If they attempt to do this, one thread will succeed, and all others will fail with an ESRCH error. After pthread_join() , the stack space associated with the thread can be used by the application.

In a sense, pthread_joini() is similar to the waitpid() call, waiting to complete the execution of the process, but with some differences. First , all peer-to-peer flows, among them there is no hierarchical order, while the processes form a tree and are subject to parent-child hierarchy. Therefore, a situation is possible when thread A spawned thread B, which in turn sealed B, but then after calling the function pthread_join() A will wait for C to complete or vice versa. Secondly , you cannot instruct one to wait for the completion of any thread , as is possible with a call to waitpid(-1, &status, options) . It is also impossible to make a non-blocking call to pthread_join() .

Early termination of the stream

Just as with process management, it is sometimes necessary to complete the process ahead of time, a multithreaded program may need to complete one of the threads ahead of time. To terminate the thread ahead of time, you can use the pthread_cancel function.

 int pthread_cancel (pthread_t THREAD_ID);

Upon successful completion, pthread_cancel() returns code 0, a nonzero value signals an error.

It is important to understand that despite the fact that pthread_cancel() returns immediately and can terminate a thread ahead of time, it cannot be called a means of forcing a thread to terminate. The fact is that the thread can not only choose the moment of completion in response to the pthread_cancel() call, but also completely ignore it. A call to the pthread_cancel() function should be viewed as a request to perform early termination of a thread. Therefore, if it is important for you that the thread be deleted, you need to wait for it to end with the pthread_join() function.

A small illustration of creating and canceling a thread.

 pthread_t tid; /*   */ pthread_create(&tid, 0, worker, NULL); … /*    */ pthread_cancel(tid);

In order not to give the impression that arbitrariness and unpredictability of the results of this call prevail, consider the table of parameters that determine the behavior of the stream after receiving a call for early termination.

As we see, there are completely non-cancellable threads, and the default behavior is deferred termination , which occurs at the moment of termination . And how do we know that this very moment has come? For this there is an auxiliary function pthread_testcancel .

 while (1) { /* -   */ /* --- */ /*  - ? */ pthread_testcancel(); }

Disconnect thread

Any default thread can be joined by calling pthread_join() and waiting for it to complete. However, in some cases, the status of the end of the stream and the return value are not interesting to us. All we need is to complete the flow and automatically unload the resources back to the OS. In such cases, we denote the thread as disconnected and use the pthread_detach() call.

 #include <pthread.h> int pthread_detach(pthread_t thread);

Upon successful completion, pthread_detach() returns code 0, a non-zero value signals an error.

Disconnected stream is a sentence. It can no longer be intercepted by calling pthread_join() to get the status of completion and other buns. It is also impossible to cancel its disconnected state. Tricky question. What happens if the completion of the thread is not intercepted by calling pthread_join() and how is this different from the scenario in which the disconnected thread ended? In the first case, we get a zombie stream, and in the second - everything will be normal.

Threads versus processes

Finally, I propose to consider several considerations on the topic: should the application be designed to be multi-threaded or run it in several processes with one thread? First, the benefits of parallel multiple threads.

In the initial part of the article, we have already indicated these advantages, therefore, in brief, we simply list them.

Threads are fairly easy to exchange data compared to processes.
Creating threads for an OS is easier and faster than creating processes.

Now a little about the shortcomings.

When programming applications with multiple threads, it is necessary to ensure the streaming security of the functions - so-called. thread safety . Applications running through multiple processes do not have such requirements.
One major stream can damage the rest, since the threads share the common address space. Processes are more isolated from each other.
Streams compete with each other in the address space. The stack and local storage of a thread, capturing part of the process’s virtual address space, thereby making it inaccessible to other threads. For embedded devices, this restriction can be significant.

The topic of threads is almost bottomless, even the basics of working with threads can pull on a couple of lectures, but we already know enough to study the structure of multi-threaded applications in Linux.

Used materials and additional information

Michael Kerrisk The Linux Programming Interface.
Abraham Silberschatz, Peter B. Galvin Greg Gagne, Operating System Concepts 9-th ed.
Nikolai Ivanov Self-Programming Programming Guide for Linux 2nd Edition.
Andrew Tanenbaum Computer Architecture .

Source: https://habr.com/ru/post/326138/

All Articles