Parallel notes №4 - we continue to get acquainted with the structures of OpenMP

Let's continue our introduction to OpenMP technology and consider some features and new directives.

OpenMP has a number of helper functions. To use them, do not forget to include the header file <omp.h>.

Functions of the Runtime Environment

These functions allow you to query and set various parameters of the OpenMP environment:

omp_get_num_procs - returns the number of compute nodes (processors / cores) in a computer.
omp_in_parallel - allows a thread to find out if it is currently engaged in the execution of a parallel region.
omp_get_num_threads - returns the number of threads included in the current thread group.
omp_set_num_thread - sets the number of threads to execute the next parallel region that will meet the current executed thread. The function can help allocate resources. For example, if we simultaneously process sound and video on a processor with four cores, then we can create one stream for sound, and three for sound processing.
omp_get_max_threads - returns the maximum allowed number of threads for use in the next parallel area.
omp_set_nested - allows or denies nested parallelism. If nested parallelism is allowed, then each thread in which the description of a parallel region occurs will generate a new group of threads for its execution and will become the main one in it.
omp_get_nested - Returns whether nested parallelism is allowed or not.

If the function name begins with omp_set_, then it can only be called outside of parallel regions. All other functions can be used both inside parallel regions and outside those.

Sync / Lock Functions

OpenMP allows you to build parallel code without using these functions, as there are directives that allow for certain types of synchronization. However, in some cases, these functions are convenient and even necessary.
')
OpenMP has two types of locks: simple and nested. Nested have suffix "nest". Locks can be in one of three states - uninitialized, locked and unlocked.

omp_init_lock / omp_init_nest_lock - initialization of a variable of type omp_lock_t / omp_nest_lock_t. Analogue InitializeCriticalSection.
omp_destroy_lock / omp_destroy_nest_lock - release of a variable of type omp_lock_t / omp_nest_lock_t. Analogue DeleteCriticalSection.
omp_set_lock / omp_set_nest_lock - one thread sets a lock, and the remaining threads wait until the thread that calls this function releases the lock using the omp_unset_lock () function. Analogue EnterCriticalSection.
omp_unset_lock / omp_unset_nest_lock - unlock. Analogue LeaveCriticalSection.
omp_test_lock / omp_test_nest_lock - non-blocking attempt to lock the lock. This function tries to lock the specified lock. If it succeeds, then for a simple lock, the function returns 1. If the lock cannot be captured, then it returns 0. The analogue of TryEnterCriticalSection.

Simple locks (omp_lock_t) cannot be set more than once, even by the same thread. Nested locks (omp_nest_lock_t) are identical to simple ones with the exception that when a thread tries to establish a locking that already belongs to it, it is not blocked.

Let's give an example of the code using the described functions. All created threads in turn will display the message “Begin work” and “End work”. Between these two messages from the same thread, messages from other threads that are output when an unsuccessful attempt to enter a closed section may occur.

  omp_lock_t lock;
 int n;
 omp_init_lock (& lock);
 #pragma omp parallel private (n)
 {
   n = omp_get_thread_num ();
   while (! omp_test_lock (& lock))
   {
     printf ("Wait ..., thread% d \ n", n);
     Sleep (3);
   }
   printf ("Begin work, thread% d \ n", n);
   Sleep (5);  // Work ...
   printf ("End work, thread% d \ n", n);
   omp_unset_lock (& lock);
 }
 omp_destroy_lock (& lock);

On a machine with four cores, the following output can be obtained:

Begin work, thread 0
Wait ..., thread 1
Wait ..., thread 2
Wait ..., thread 3
Wait ..., thread 2
Wait ..., thread 3
Wait ..., thread 1
End work, thread 0
Begin work, thread 2
Wait ..., thread 3
Wait ..., thread 1
Wait ..., thread 3
Wait ..., thread 1
End work, thread 2
Begin work, thread 3
Wait ..., thread 1
Wait ..., thread 1
End work, thread 3
Begin work, thread 1
End work, thread 1

Timer Functions

omp_get_wtime — Returns the astronomical time in seconds in the calling stream (a real double-precision number is double) that has elapsed since some point in the past. If a certain section of the program is surrounded by calls to this function, then the difference in return values will be shown by the time of operation of this section.
omp_get_wtick () - returns the resolution of the timer in seconds in the calling thread, that is, the accuracy of the timer.

On this familiarity with the functions we will complete and consider a couple of new directives. These directives can be called options for creating parallel regions.

if (condition)

Execution of a parallel domain by condition. Creation of several threads is carried out only when some condition is met. If the condition is not met, the code is executed in sequential mode.

Example of use:

  void test (bool x)
 {
   #pragma omp parallel if (x)
   if (omp_in_parallel ())
   {
     #pragma omp single
     printf_s ("parallelized with% d threads \ n",
              omp_get_num_threads ());
   }
   else
   {
     printf_s ("single thread \ n");
   }
 }

 int _tmain (int argc, _TCHAR * argv [])
 {
   test (false);
   test (true);
   return 0;
 }

Result of work:

  single thread
 parallelized with 4 threads

num_threads

Explicitly specifying the number of threads that will execute the parallel area. By default, the last value set using the omp_set_num_threads () function is selected.

If we modify the example above as follows:

  ...
 #pragma omp parallel if (x) num_threads (3)
 ...

then we get the following output:

  single thread
 parallelized with 3 threads

In the next issue of "Parallel Notes" we will continue ...

Source: https://habr.com/ru/post/86820/

All Articles