📜 ⬆️ ⬇️

"The Simdsons" - a little about the SIMD vector instruction family

image
When I first saw this cartoon, I was absolutely not impressed. Some yellow (well, at least not green) little men with bulging eyes, the plot is not particularly interesting, the jokes are not funny ...

But soon everything changed radically. No, “The Simpsons” remained the same, but I spent about a month on a business trip to the world of The Simpsons - the USA, where I finally understood why this series was considered by many to be the best. “The Simpsons” is a truly beautiful parody of the American way of life from small things to global issues, it is both humor and philosophy and many more excellent reasons to watch.

Why am I doing this? And the fact that the first acquaintance with the SIMD family of vector instructions (and, in particular, SSE) was probably not impressed by many programmers. Some new instructions with bulging long registers, simultaneously working on a data group, make a lot of trouble, but there is probably little confusion ...
')
I will try to radically change this view. No, I will not convince you that SSE is an excellent tool for optimizing applications. I'll go the other way. The Simpsons - 21 season (by the way, this is the longest TV series in the history of American TV). In honor of this, I will bring 21 interesting facts about Intel SIMD. I hope that is really interesting - even for connoisseurs of SIMD.

  1. In 2012, we will mark the end of the world a round date - 16 years since the appearance of the first set of Intel vector instructions - MMX (57 commands for working with integer data packed into 64-bit vectors)
  2. Intel SIMD support (generation of the corresponding vectorized code) is in all (known to me) C / C ++ compilers, moreover, for the last few years, it is being done even ahead - so there are no newer Intel AVX vector instructions in the market yet, but in Microsoft Visual Studio 2010, their support is already there - you can use it.
  3. Computers without SIMD are now the rarest exception to the rule. So according to the very popular digital distribution service STEAM (almost a thousand games and collections are distributed through Steam, the number of active users exceeds 15 million) in April 2010, when the ratio of the total number of Intel CPUs to AMD is about 70% to 30 %, 98.40% of users support the computer SSE2, and at 95.17% - SSE3 and SSE4!
  4. The Microsoft , Intel, and GCC C / C ++ compilers also support intrinsic functions for working with SIMD. For the programmer, they look like ordinary C functions that perform arithmetic, logical and some auxiliary operations on a group of 2-16 numbers (depending on their type, operation and used set of instructions), for example, the addition of 4 numbers _mm_add_ps (a, b ) . And for the compiler, intrinsic is not a function at all, but simply an “assembler macro”, so the time for its call is not wasted. Intrinsics are well documented in both MSDN and Intel Compiler help.
  5. Another possible interface for working with SIMD when using the Intel Compiler is its class library — Intel C ++ Class Libraries . They contain classes of both integer and floating point vectors (I32vec4, F32vec4, I8vec16, ..), their methods and overloaded operators use MMX-SSE-SSE2 intrinsiki. But if intrinsics are quite popular with developers, then vector classes are not, although they decorate and simplify the code with absolutely negligible overhead. Most likely, the point here is the universality-portability of the code. If conditional compilation is usually used for intrinsics (something like #ifdef USE_SIMD {intrinsiki} #else {regular code}), then for classes it seems to be a little more difficult to do. But only at first glance. Implementations of vector classes are combined with their definitions, so correcting header files (which ones — see the documentation for the Intel Compiler) to your taste, for example, by including a non-SIMD version — is elementary.
  6. To find out how efficiently the Intel compiler managed to vectorize your code, you can use the / (Q) vec-report [n] option. Where n is the level of detail of the message. The compiler can not only indicate which cycles were vectorized and which were not, but, most importantly, explain the reason for the latter. For example, “Existence of vector dependency”, “Condition too Complex” (Too difficult condition of the if operator) or “Mixed Data Types” (Mixing in one operation of different data types), “Not Inner Loop” (This cycle is not the innermost one, but only the most vectorized ones) and even “Vectorization possible but it seems inefficient” (Vectorization is possible, but useless for increasing productivity)
  7. If you disagree with the Intel compiler's opinion on vectorization of a specific cycle, then you can ban it from using special pragmas for more precise control of vectorization. Namely, to prohibit vectorization, #pragma novector ; and for “forced” vectorization - #pragma ivdep ( i gnore v ector dep endencies = ignore possible dependencies of subsequent loop iterations from previous ones) and #pragma vector with arguments allowing \ forbidding the compiler to use faster instructions for moving aligned data or streaming (streaming store). See the Intel compiler documentation for full details.
  8. In the upcoming release of the Intel C / C ++ compiler, the auto-vectorizer will be improved: in particular, it will be possible to get not only a diagnosis of the non-vectorisable cycle, but also advice to correct the situation. It is also planned to add another pragma to control the vectorization of cycles - #pragma simd with a variety of possible arguments will allow the programmer to more flexibly control vectorization compared to the existing #pragma ivdep. Details - will be later.
  9. If you use a non-Intel compiler, or if you only have an executable file, but want to know which SIMD instructions were generated by the compiler and in what quantity, you can use a simple and convenient tool. Written by an Intel engineer and completely free utility Simd Check will show SIMD usage statistics in an exe file. Read the details and download Simd Check here .
  10. Another opportunity to find out (albeit approximately) how many SIMD instructions your application uses, and most importantly, in which particular functions, gives Intel Vtune. This tool allows profiling of MMX and SSE Technology Events — processor-registered events of execution of certain SIMD instructions. For example, SIMD_INST_RETIRED.VECTOR will count the number of SSE2 integer instructions, and SIMD_INSTR_RETIRED will count the total number of SIMD instructions executed. Details - of course in Intel Vtune help.


The end of the first 10 episodes, to be continued.

Source: https://habr.com/ru/post/94381/


All Articles