Parallel notes number 3 - the basic structure of OpenMP

Let us begin our acquaintance directly using the OpenMP technology and consider in this note some basic constructions.

When using OpenMP, we add two types of constructions to the program: the functions of the OpenMP runtime environment and special directives #pragma.

Functions

OpenMP functions are rather auxiliary, since the implementation of parallelism is carried out through the use of directives. However, in some cases they are very useful and even necessary. Functions can be divided into three categories: functions of the operating environment, functions of blocking / synchronization, and functions of working with timers. All these functions have names beginning with omp_ and are defined in the header file omp.h. We will return to the consideration of functions in the following notes.

Directives

The C / C ++ #pragma construct is used to specify additional instructions to the compiler. Using these constructions, you can specify how to align the data in the structures, prohibit the issuance of certain warnings, and so on. Record form:

  #pragma directives

Using the special key directive "omp" indicates that the commands belong to OpenMP. Thus, the #pragma directives for working with OpenMP have the following format:

  #pragma omp <directive> [section [[,] section] ...]

Like any other pragma directives, they are ignored by those compilers that do not support this technology. At the same time, the program is compiled without errors as consistent. This feature allows you to create well-portable code based on OpenMP technology. Code containing OpenMP directives can be compiled by a C / C ++ compiler that knows nothing about this technology. The code will be executed as sequential, but it is better than doing two branches of the code or arranging a lot of #ifdef.
OpenMP supports the directives private, parallel, for, section, sections, single, master, critical, flush, ordered and atomic, and a number of others that define work sharing mechanisms or synchronization designs.
')

Parallel directive

Perhaps the most important one can be called the parallel directive. It creates a parallel region for the next structured block, for example:

  #pragma omp parallel [other directives]
   structured block

The parallel directive specifies that the code block should be executed in parallel in several threads . Each of the created threads will execute the same code contained in the block, but not the same set of commands. Different threads can run different branches or process different data, depending on such operators as if-else or the use of work distribution directives.

In order to demonstrate the launch of several streams, we will print the text in a parallel-blocking block:

  #pragma omp parallel
 {
   cout << "OpenMP Test" << endl;
 }

On a 4th nuclear machine, we can expect to see the following output.

  OpenMP Test
 OpenMP Test
 OpenMP Test
 OpenMP Test

But in practice, I got the following conclusion:

  OpenMP TestOpenMP Test
 OpenMP Test

 OpenMP Test

This is due to the sharing of a single resource from multiple threads. In this case, we output the text in four streams to one console, which in no way agree among themselves about the output sequence. Here we are witnessing the emergence of a race condition.

A race condition is an error in the design or implementation of a multitasking system in which the operation of the system depends on the order in which the parts of the code are executed. This kind of error is the most common with parallel programming and is very cunning. Reproduction and localization of this error is often difficult due to the inconstancy of its manifestation (see also the term Heisenbag ).

Directive for

The above example demonstrates the presence of parallelism, but by itself it is meaningless. Now we will benefit from parallelism. Suppose we need to extract the root from each element of the array and put the result in another array:

  void VSqrt (double * src, double * dst, ptrdiff_t n)
 {
   for (ptrdiff_t i = 0; i <n; i ++)
     dst [i] = sqrt (src [i]);
 }

If we write:

  #pragma omp parallel
 {
   for (ptrdiff_t i = 0; i <n; i ++)
     dst [i] = sqrt (src [i]);
 }

then instead of speeding up we will do a lot of extra work. We will extract the root of all the elements of the array in each stream. In order to parallelize the loop, we need to use the “for” work sharing directive. The #pragma omp for directive states that when the for loop is executed in a parallel region, the loop iterations should be distributed between the threads of the group:

  #pragma omp parallel
 {
   #pragma omp for
   for (ptrdiff_t i = 0; i <n; i ++)
     dst [i] = sqrt (src [i]);
 }

Now each created stream will process only the part of the array given to it. For example, if we have 8000 elements, then on a machine with four cores, the work can be distributed as follows. In the first stream, the variable i takes values from 0 to 1999. In the second, from 2000 to 3999. In the third, from 4000 to 5999. In the fourth, from 6000 to 7999. Theoretically, we get acceleration 4 times. In practice, the acceleration will be slightly less due to the need to create threads and wait for them to complete. At the end of the parallel region, barrier synchronization is performed. In other words, reaching the end of the region, all threads are blocked until the last thread completes its work.

You can use abbreviated notation by combining several directives into one control string. The above code will be equivalent to:

  #pragma omp parallel for
 for (ptrdiff_t i = 0; i <n; i ++)
   dst [i] = sqrt (src [i]);

Directives private and shared

Regarding parallel regions, the data may be shared or private. The private data belongs to the stream and can only be modified by it. Shared data is available to all threads. In the example above, the array represented general data. If a variable is declared outside a parallel region, then by default it is considered common, and if inside it is private. Suppose that to calculate the square root we need to use the intermediate variable value:

  double value;
 #pragma omp parallel for
 for (ptrdiff_t i = 0; i <n; i ++)
 {
   value = sqrt (src [i]);
   dst [i] = value;
 }

In the above code, the value variable is declared outside a parallel region defined by the directives "#pragma omp parallel for", which means it is shared. As a result, the value variable will begin to be used by all threads simultaneously, which will lead to an error in the race state and we will get garbage at the output.

To make a variable for each thread private, we can use two methods. The first is to declare a variable inside a parallel region:

  #pragma omp parallel for
 for (ptrdiff_t i = 0; i <n; i ++)
 {
   double value;
   value = sqrt (src [i]);
   dst [i] = value;
 }

The second is to use the private directive. Now each thread will work with its own value variable:

  double value;
 #pragma omp parallel for private (value)
 for (ptrdiff_t i = 0; i <n; i ++)
 {
   value = sqrt (src [i]);
   dst [i] = value;
 }

In addition to the private directive, there is a shared directive. But this directive is usually not used, since without it all variables declared outside the parallel region will be common. The directive can be used to increase the visibility of the code.

We have covered only a small part of the OpenMP directives and will continue to familiarize ourselves with them in the following lessons.

Source: https://habr.com/ru/post/85273/

All Articles