📜 ⬆️ ⬇️

N5 parallel notes - continue to get acquainted with the designs of OpenMP

image
I bring to your attention another note on acquaintance with the technology of parallel programming OpenMP . Consider the directives: atomic, reduction.



Atomic directive

Consider the code that summarizes the elements of the array:
  intptr_t A [1000], sum = 0;
 for (intptr_t i = 0; i <1000; i ++)
   A [i] = i;
 for (intptr_t i = 0; i <1000; i ++)
   sum + = A [i];
 printf ("Sum =% Ii \ n", sum); 

')
The result of this code is:

  Sum = 499500
 Press any key to continue.  .  . 


Let's try to parallelize this code using the directives "omp" and "parallel":

  #pragma omp parallel for
 for (intptr_t i = 0; i <1000; i ++)
   sum + = A [i]; 


Unfortunately, such parallelization is incorrect, as in the process of work a race condition will arise. Multiple threads will attempt to simultaneously access the sum variable for reading and writing. The sequence of requests may be as follows:
 The value of the variable sum = 500;
 The value of i in the first thread = 1;
 The value of i in the second stream = 501;

 Stream 1: processor register = sum
 Stream 2: processor register = sum
 Stream 1: processor register + = i
 Stream 2: processor register + = i
 Stream 2: sum = processor register
 Stream 1: sum = processor register

 The variable value sum = 501, not 1002.


The incorrectness of parallelization can also be seen in practice by running the demo code. In particular, I received:

  Sum = 486904
 Press any key to continue.  .  . 


Critical sections can be used to prevent common variable updating errors. However, if the variable “sum” is common, and the operator has the form sum = sum + expr, then the more convenient means is the “atomic” directive. The “atomic” directive works faster than critical sections, since some atomic operations can be directly replaced by processor commands.

This directive refers to the assignment operator immediately following it, ensuring correct operation with the common variable on its left-hand side. At the time the statement is executed, access to this variable is blocked for all threads currently running, except for the thread performing the operation.

The atomic directive applies only to operations of the following type:

Here X is a scalar variable, EXPR is an expression with scalar types, in which the variable x is not present, BINOP is not an overloaded operator +, *, -, /, &, ^, |, <<, >>. In all other cases, the atomic directive cannot be used.

The revised code looks like this:

  #pragma omp parallel for
 for (intptr_t i = 0; i <1000; i ++)
 {
   #pragma omp atomic
   sum + = A [i];
 } 


This solution gives the correct result, but is extremely inefficient. The speed of the above code will be lower than the speed of the sequential version. During the operation of the algorithm, locks will constantly arise, as a result of which almost all the work of the cores will be reduced to waiting. The atomic directive is used in this example only to demonstrate how it works. In practice, the use of this directive is rational with a relatively rare reference to common variables. Example:

  unsigned count = 0;
 #pragma omp parallel for
 for (intptr_t i = 0; i <N; i ++)
 {
   // Slow function
   if (SlowFunction ())
   {
     #pragma omp atomic
     count ++;
   }
 } 


It should be remembered that in the expression to which the “atomic” directive is applied, only work with the variable in the left part of the assignment operator is atomic, and the calculations in the right part need not be atomic. Consider this with an example where the atomic directive does not affect the call of the functions used in the expression:

  class Example
 {
 public:
   unsigned m_value;
   Example (): m_value (0) {}
   unsigned getValue ()
   {
     return ++ m_value;
   }
   unsigned GetSum ()
   {
     unsigned sum = 0;
     #pragma omp parallel for
     for (ptrdiff_t i = 0; i <100; i ++)
     {
       #pragma omp atomic
       sum + = GetValue ();
     }
     return sum;
   }
 }; 


This example contains a race condition error, and the value returned to it may vary from run to run. In the code, the increase in the variable “sum” is protected using the “atomic” directive. But the “atomic” directive does not affect the call to the GetValue () function. Calls occur in parallel threads, which leads to errors when executing the "++ m_value" operation inside the GetValue function.

Reduction directive

It is logical to ask the question, but how quickly sum up the elements of an array? The “reduction” directive will help.

Directive format: reduction (operator: list)

The possible operators are "+", "*", "-", "&", "|", "^", "&&", "||".

List - lists common variable names. Variables must have a scalar type (for example, float, int or long, but not std :: vector, int [], etc.).

Principle of operation:
  1. Local copies are created for each variable in each stream.
  2. Local copies are initialized according to the type of statement. For additive operations - 0 or its analogues, for multiplicative operations - 1 or its analogues. See also table N1.
  3. The local operator is executed on local copies of variables after all the operators in the parallel domain are executed. The order of the statements is not defined.


reduction

Table N1 - reduction operators

Now using “reduction”, the effectively working code will look like:

  #pragma omp parallel for reduction (+: sum)
 for (intptr_t i = 0; i <1000; i ++)
   sum + = A [i]; 


In the next issue of "Parallel Notes" we will continue ...

Source: https://habr.com/ru/post/88574/


All Articles