
I bring to your attention another note on acquaintance with the technology of parallel programming
OpenMP . Consider the directives: atomic, reduction.
Atomic directive
Consider the code that summarizes the elements of the array:
intptr_t A [1000], sum = 0;
for (intptr_t i = 0; i <1000; i ++)
A [i] = i;
for (intptr_t i = 0; i <1000; i ++)
sum + = A [i];
printf ("Sum =% Ii \ n", sum);
')
The result of this code is:
Sum = 499500
Press any key to continue. . .
Let's try to parallelize this code using the directives "omp" and "parallel":
#pragma omp parallel for
for (intptr_t i = 0; i <1000; i ++)
sum + = A [i];
Unfortunately, such parallelization is incorrect, as in the process of work a
race condition will arise. Multiple
threads will attempt to simultaneously access the sum variable for reading and writing. The sequence of requests may be as follows:
The value of the variable sum = 500;
The value of i in the first thread = 1;
The value of i in the second stream = 501;
Stream 1: processor register = sum
Stream 2: processor register = sum
Stream 1: processor register + = i
Stream 2: processor register + = i
Stream 2: sum = processor register
Stream 1: sum = processor register
The variable value sum = 501, not 1002.
The incorrectness of parallelization can also be seen in practice by running the demo code. In particular, I received:
Sum = 486904
Press any key to continue. . .
Critical sections can be used to prevent common variable updating errors. However, if the variable “sum” is common, and the operator has the form sum = sum + expr, then the more convenient means is the “atomic” directive. The “atomic” directive works faster than critical sections, since some atomic operations can be directly replaced by processor commands.
This directive refers to the assignment operator immediately following it, ensuring correct operation with the common variable on its left-hand side. At the time the statement is executed, access to this variable is blocked for all threads currently running, except for the thread performing the operation.
The atomic directive applies only to operations of the following type:
- X BINOP = EXPR
- X ++
- ++ X
- X−−
- −−X
Here X is a scalar variable, EXPR is an expression with scalar types, in which the variable x is not present, BINOP is not an overloaded operator +, *, -, /, &, ^, |, <<, >>. In all other cases, the atomic directive cannot be used.
The revised code looks like this:
#pragma omp parallel for
for (intptr_t i = 0; i <1000; i ++)
{
#pragma omp atomic
sum + = A [i];
}
This solution gives the correct result, but is extremely inefficient. The speed of the above code will be lower than the speed of the sequential version. During the operation of the algorithm, locks will constantly arise, as a result of which almost all the work of the cores will be reduced to waiting. The atomic directive is used in this example only to demonstrate how it works. In practice, the use of this directive is rational with a relatively rare reference to common variables. Example:
unsigned count = 0;
#pragma omp parallel for
for (intptr_t i = 0; i <N; i ++)
{
// Slow function
if (SlowFunction ())
{
#pragma omp atomic
count ++;
}
}
It should be remembered that in the expression to which the “atomic” directive is applied, only work with the variable in the left part of the assignment operator is atomic, and the calculations in the right part need not be atomic. Consider this with an example where the atomic directive does not affect the call of the functions used in the expression:
class Example
{
public:
unsigned m_value;
Example (): m_value (0) {}
unsigned getValue ()
{
return ++ m_value;
}
unsigned GetSum ()
{
unsigned sum = 0;
#pragma omp parallel for
for (ptrdiff_t i = 0; i <100; i ++)
{
#pragma omp atomic
sum + = GetValue ();
}
return sum;
}
};
This example contains a race condition error, and the value returned to it may vary from run to run. In the code, the increase in the variable “sum” is protected using the “atomic” directive. But the “atomic” directive does not affect the call to the GetValue () function. Calls occur in parallel threads, which leads to errors when executing the "++ m_value" operation inside the GetValue function.
Reduction directive
It is logical to ask the question, but how quickly sum up the elements of an array? The “reduction” directive will help.
Directive format: reduction (operator: list)
The possible operators are "+", "*", "-", "&", "|", "^", "&&", "||".
List - lists common variable names. Variables must have a scalar type (for example, float, int or long, but not std :: vector, int [], etc.).
Principle of operation:
- Local copies are created for each variable in each stream.
- Local copies are initialized according to the type of statement. For additive operations - 0 or its analogues, for multiplicative operations - 1 or its analogues. See also table N1.
- The local operator is executed on local copies of variables after all the operators in the parallel domain are executed. The order of the statements is not defined.
Table N1 - reduction operators
Now using “reduction”, the effectively working code will look like:
#pragma omp parallel for reduction (+: sum)
for (intptr_t i = 0; i <1000; i ++)
sum + = A [i];
In the next issue of "Parallel Notes" we will continue ...