📜 ⬆️ ⬇️

Parallelization of recursive functions using OpenMP 3.0 task

image
Recently I stumbled upon the blog “ OpenMP 3.0 tasking ". After that, I decided to check how well recursive functions parallelize with the OpenMP 3.0 task. I recall that before the third standard there was no support for dynamic or irregular parallelism (for example, loops with while or recursive functions).


Computing systems


I will begin with a description of the computing systems that I used.
Below is a list of computers on which I can cover the results. The rest are not yet in production.
image

Parallelization process


For testing, I took an example of calculating Fibonacci numbers . The idea of ​​parallelization was as follows: create an asynchronous task for each recursion step, and then organize synchronization before exiting the function. Unfortunately, with this approach, I did not get more acceleration than 1. Due to the dominance of the cost of runtime. The solution was found here and it turned out to be simple: create tasks only for part of the recursion steps, and execute the rest part sequentially. In other words, balance the costs of runtime and execution time. Below is a piece of code that is responsible for balancing:
INT_TYPE
fib( INT_TYPE n ) {
INT_TYPE i, j;
if( n < SPLITTER ) {
return fib_serial( n );
} else {
#pragma omp task shared( i )
i = fib( n - 1 );
#pragma omp task shared( j )
j = fib( n - 2 );
#pragma omp taskwait
return i + j;
}
}

In this example, SPLITTER is responsible for balancing. All source code can be downloaded here .
But not everything is so simple. For different SPLITTER values, the parallelization efficiency is different. Using the method of pointing a finger at the sky, I picked up two splitter values ​​19 and 32 at which the parallelization efficiency approaches one.
')

Compilations


Of all the software, only compilers will be of interest. I had three compilers at my disposal: Intel® Composer XE for Linux, Microsoft Visual C ++ 2010 and GCC. Visual C ++ is immediately discarded because it does not support OpenMP 3.0 (at the moment there is only support for OpenMP2.5). GCC supports OpenMP 3.0 since version 4.4, I used 4.5.0.
Below are compile times and sizes of executable files with both dynamic and static libraries. But it is worth noting that, unlike GCC, Intel® Composer XE is installed on a remote machine. Which leads to an increase in compile time.

image

The strings for GCC and Intel® Composer XE are as follows:
# gcc ./fib_task.c -fopenmp
# gcc ./fib_task.c -fopenmp -static
# icc ./fib_task.c -openmp -static
# icc ./fib_task.c -openmp

Execution time and scalability


All measurements were made to search for the 50th Fibonacci number for SPLITTER = 19 and SPLITTER = 32.
image

Acceleration and Parallelization Efficiency


image
This graphic shows the acceleration for SPLITTER = 32. In turn, efficiency is the ratio of the number of flows and acceleration.

Source and executable files


As mentioned above, the source file can be downloaded here . In this example, the 41st Fibonacia is calculated. To play the measurements, you need to change the variable n = 50 and the variable expected = 12586269025UL. You can also download immediately

findings


Actually, the charts with the times of execution and accelerations all speak for themselves.

The rest of the conclusions, I suggest you do it yourself, because everyone is interested in his own.

Afterword


On PringerLink I found an article " An Extension to Improve OpenMP Tasking Control ", in which it is proposed to introduce a new final clause, which acts as a balancer. From my point of view, this is a sober sentence. And, you can add balancer clauses, which in realtime will produce adaptive balancing.

Please refer to the Optimization Notice page for more details on performance and optimization in Intel software products.

Source: https://habr.com/ru/post/99823/


All Articles