Auto-vectorization and auto-parallelization with Guided Auto-parallelization (GAP)

Having published the post New features of vectorization and parallelization in the Intel® Parallel Composer , I decided to go all the way myself, which I suggested in the commentary . Namely, access to the Intel® Parallel Composer to test one of the features (English feature). Since I was already porting serial code to CEAN + Cilk, I stopped testing Guided Auto-parallelization (GAP). The first sentence in the GAP documentation “ Guided auto-parallelization is the code that can be parallelized .” Gave me the idea that I already know something like that from the compiler. Namely, auto-vectorization and auto-parallelization diagnostics keys –vec-report and –par-report . What are the differences read under the cut.
Immediately make a reservation, I used only what is available during beta testing, I did not use any inside information. I took an example from the presentation and played with it. First of all I checked for auto-vectorization:
#icl -c gaptestcase.cpp /Qvec-report3
...

gaptestcase.cpp(19) (col. 1): remark: routine skipped: no vectorization candidates.

Further, though I am not very good at auto-parallelization (I prefer to parallelize myself), checked and how well the code is auto-parallelized:
#icl -c gaptestcase.cpp /Qparallel /Qpar-report3
It gave me nothing. After that, I decided to refer to the GAP documentation. It turned out that there are three types of compiler options:

Enable diagnostics for vectorization, parallelization, and data transformation (data transformations), or all together: –guide-vec, –guide-par, and –guide-data-trans (Linux * OS), or / Qguide-vec, / Qguide -par, and / Qguide-data-trans (Windows * OS) or –guide [= n] (Linux * OS) or / Qguide [: n] (Windows *).
Selection of specific code sections for diagnostics (eg most frequently used code areas (eng. Hotspots)): -guide-opts = <arg.>. Eg / Qguide-opts: "bar.f90, 'module_1 :: routine_name`"
Output redirection: -guide-file = <file_name> -guide-file-append [= file_name]. By default, all output to stderr, but you can redirect to a file or add to a file.

Next, follow the words of Mike Naumenko:
“ First a little excursion into history
And we will move on to practice later (later, it means) ”,
moved on to practice. First, I checked what GAP says about the auto-vectorization of the example:

#icl -c gaptestcase.cpp /Qguide-vec
Output on display:

GAP REPORT LOG OPENED ON Sat Jul 10 17:39:54 2010
remark #30761: Add -Qparallel option if you want the compiler to generate recommendations for improving auto-parallelization.
Recompile with -guide (Linux) or -Qguide (Windows) in both passes of IPO.
…\gaptestcase.cpp(27): remark #30534: (LOOP) Add -Qansi-alias option for better type-based disambiguation analysis by the compiler if appropriate (option will apply for entire compilation). This will improve optimizations for the loop at line 27 [VERIFY] Make sure that the semantics of this option is obeyed for entire compilation.
…\gaptestcase.cpp(27): remark #30513: (VECT) Use "#pragma ivdep" to vectorize the loop at line 27, if these arrays in the loop do not have unsafe cross-iteration dependencies: ?nodes@@3PAPAUTEST_STRUCT@@A, ?distances@@3PAMA. [VERIFY] A cross-iteration dependence exists if a memory location is modified in an iteration of a loop and accessed (a read or write) in another iteration of a loop. Make sure that there are no such dependencies, or that any cross-iteration dependencies can be safely ignored.
Number of advice-messages emitted for this compilation session: 2.
END OF GAP REPORT LOG

Sensible tips. After adding the #pragma ivdep to the code and / Qansi-alias compilation options:

#icl -c gaptestcase.cpp /Qvec-report3 /Qansi-alias
…\gaptestcase.cpp(27) (col. 3): remark: LOOP WAS VECTORIZED.

After that, I checked the author-paralleling:
#icl -c gaptestcase.cpp /Qguide-par /Qparallel /Qansi-alias
Output on display:

GAP REPORT LOG OPENED ON Sat Jul 10 17:36:46 2010
…\gaptestcase.cpp(27): remark #30525: (PAR) If the trip count of the loop at line 27 is greater than 16, then use "#pragma loop count min(16)" to parallelize this loop. [VERIFY] Make sure that the loop has a minimum of 16 iterations.
Number of advice-messages emitted for this compilation session: 2.
END OF GAP REPORT LOG

That's all right and works.
As you can see, in this example, GAP showed itself from the good side. Namely, he gave practical advice:

use the -Qansi-alias compilation key;
add directives #pragma ivdep and #pragma loop count min(16) .

That allowed the compiler to successfully auto-vectorize and parallelize an example.
Lastly, for fans of Microsoft Visual Studio 2010. GAP, like the composer, integrates into the IDE:

After that, the output goes to the standard output window.
That's all. Thanks for attention.

Please refer to the Optimization Notice page for more details on performance and optimization in Intel software products.

Source: https://habr.com/ru/post/98768/

All Articles

Auto-vectorization and auto-parallelization with Guided Auto-parallelization (GAP)

More articles: