📜 ⬆️ ⬇️

What else you need to learn about OpenCL C before you write on it

As it was written
float4 val = (0, 0, 0, 0); 

What the author wanted to write
 float4 val = (float4)(0, 0, 0, 0); 

How to write
 float4 val = 0; 


If you come across OpenCL or plan to encounter and don’t see the difference between the first and second options, and the third one you have doubts - “Does it compile at all?” - welcome under the cat, there are many nuances of the language and absolutely nothing about the API and performance optimization .

The most expensive computer memory - in the head of programmers. This is probably why the two most popular GPU programming technologies — CUDA and OpenCL — are not based on fundamentally new parallel languages, not assemblers for specific architectures, but on adapted C ++ and C. In the case of OpenCL, the most popular core writing language is OpenCL C 1.2 - dialect C based on ISO C99. Removed the standard library, added vector types and operations on them, several functions for locking and determining its place among other threads, four address spaces. Simple atomic operations that are incompatible with C11 have been added (atomic operations and locks from C11 have been added to OpenCL C 2.0, which has not been widely adopted yet). Added some handy features that were not in C, something like standardized intrinsics .

In terms of language and API there are many examples, most of which are the addition of two vectors. There is a good, albeit dry, official specification , several English-language books, optimization tips from device manufacturers. As soon as the programmer understands how to write his task - he writes it on the familiar C99 using new functions like get_global_id() and everything in terms of language seems clear and primitive. Such a familiar C99 that you can easily fall into the trap and not notice it for the time being. Yes, OpenCL C is very similar to C, but there are some very useful differences, unfairly forgotten because there are no analogues in C99, and very insidious differences, hiding behind a similar syntax.

I looked through a lot of code on OpenCL C and people who are just starting to write on it, make the same mistakes:
')

As you can see from the list, it’s all a matter of type conversion. In the OpenCL 1.2 specification, these are sections 6.2. * Conversions and Type Casting . In addition, the next section 6.3 Operators is tricky, and nobody reads it either . As experience shows, much of the specification is written insufficiently clearly and too boringly - I will try to fill the gap in the available Russian-language documentation on these topics with this article.

Vector literals or explicit type conversions.


The new design in OpenCL C is a vector literal with which you can set the value of a vector. Unfortunately, its syntax is very similar to the explicit type conversion:

 ( )(   ) 

for example

 (int2)(1,2); 

or

 //      int2 a = (int2)(1, 2); // a=[1,2] //       int2 b = (int2)(3, 4); // b=[3,4] //          int4 c = (int4)(a, b); // c=[1,2,3,4] //         int3 d = (int3)(1, c.xy); // d=[1,1,2] //       ?! float2 e = (float2)(1); // e=[1.0f,1.0f] 

However, (float2)(1) and other examples above are not a type conversion, but a new construction (see 6.1.6 Vector Literals in the OpenCL 1.2 specification ).

Inside the second brackets there must be a total of as many scalars or vector components as in a vector type inside the first brackets. There is one exception - if to the right there is only one scalar value in brackets, then it itself "multiplies" to the required number of vector components.

Explicit coercion of vector types in the style of C is simply not in the language. A fatal mistake can be made if you see the “familiar” type conversion instead of a vector literal with obscene eyes. Then the type in parentheses can be removed at the beginning: “After all, it is compiled anyway, why unnecessary type conversion? Already implicitly led. "

Real example:

 int2 coords = (get_global_id(0), get_global_id(1)); 

coords is not a vector literal given, for a vector literal it was necessary to add a vector type:

 int2 coords = (int2)(get_global_id(0), get_global_id(1)); 

We got the following: (get_global_id(0), get_global_id(1)) and this is a construction from ordinary C - in brackets the call of two functions through the operator " , " (comma), which means that both functions will be executed and the expression will return the result second function, as if we wrote:

 get_global_id(0); int2 coords = get_global_id(1); 

The implicit conversion of a scalar to a vector (about it a little further) will work and the coords will be the vector [get_global_id(1), get_global_id(1)] , rather than [get_global_id(0), get_global_id(1)] , as expected.

Fortunately, for simple cases the compiler may issue a warning like “ warning: expression result unused ”, but you should not count on it.

This code can still be quickly found, because it does not work correctly. But the following example will work, while the color is gray. When we want to change the color, for some reason it will still be mockingly gray.

 //  ,    float3 color = (0.5f, 0.5f, 0.5f); // color=[0.5f, 0.5f, 0.5f] 

The code works, the project is delivered. And suddenly it took a slight change - the color of gray to make dark blue.

 //  ,    float3 color = (0.1f, 0.1f, 0.5f); // color=[0.5f, 0.5f, 0.5f] 

It was necessary to use the vector literal:

 //   float3 color = (float3)(0.1f,0.1f, 0.5f); // color=[0.1f,0.1f, 0.5f] 

Convert Boolean Values ​​to Vectors


 int val = true; int2 val2 = true; 

What is the value of val ? What - in val2 ?

For scalars, the ISO C99 rules apply, when converting the bool value (and the bool type and the constants true and false exist in C99 and in OpenCL C) false becomes zero, and true becomes unity. These are rules for scalars. Thus, in val will be "1". This is not always convenient, but this behavior is embedded in the brain of a programmer — constructions like x+=(a>b) no longer surprising.

However, in OpenCL C, when converting to a vector integer type, values ​​of type bool return either integers with all bits in zero, or all bits in one, which corresponds to (int)-1 . Here is what the specification says on this topic (section 6.2.2 Explicit Casts ):

It will be set to -1 (ie all bits set) if it is true.

Thus, in val2 will be a vector [-1, -1] . This is a little unexpected in the context of type conversion when the expression is first cast to the type of the vector component, and then multiplies - as for the other types, but for bool exactly this behavior is declared. With proper use, it allows you to replace conditional expressions with bitwise operations.

For quick tests like “Compile or not? What is the value in the variable? ”I wrote and posted the opencl-sandbox project on the github. I checked all the examples from this article on my car. Including this :

 __kernel void bool_to_int_vec() { int val = true; int2 val2 = true; printf("int val = true; // val=%d\n", val); printf("int2 val2 = true; // val2=%v2d\n", val2); if(val2.x == -1 && val2.y == -1) { printf("Compiler follows specification for bool->intn conversion, OK\n"); } else { printf("Compiler does not follow specification for bool->intn conversion, FAILED\n"); } } 

As you know, compiler developers are also people and don’t remember the specifications by heart.
In my car, as a result of the experiment with two platforms, I observed two devices in each:
$ ./clrun ../kernels/bool_to_int_vec.cl
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Tonga
int val = true; // val=1
int2 val2 = true; // val2=-1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) HD Graphics
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED


Two OpenCL platforms - AMD and Intel. Each platform has two devices - a GPU and a CPU. And only the AMD compiler for the GPU (the most mature) follows the specification, the other three write to val2 vector of units, not -1.

After some time, I checked the same kernel on another machine with three OpenCL implementations — from AMD, Intel, and NVidia:
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Ellesmere
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
int val = true; // val=1
int2 val2 = true; // val2= 1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) HD Graphics 630
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on NVIDIA CUDA / GeForce GTX 1060 6GB
int val = true; // val=1
1,1
Compiler does not follow specification for bool->intn conversion, FAILED


There are five devices in the system. AMD compilers behave the same way. Intel's more recent compiler has “corrected” and now behaves in accordance with the standard. The NVidia compiler failed not only to convert to a vector type, but simply to display the string in the second printf() .

There are two conclusions from this:

  1. without knowledge of the specification, do not write portable code;
  2. it is necessary to cover OpenCL kernels with tests, because each platform understands the specifications in its own way.


Logical and vector comparison operators


As well as to bring bool to int , the corresponding operators have different behavior for scalars and vectors. Values ​​of the result of the operators > , < , >= , <= , == != , && , || ! This is an int . For scalars, 0 or 1. For vectors, a vector of the appropriate length from int 's with values ​​of 0 or -1 (all bits are set to 1).

 int a = 1 > 0; // a=1 int4 b = (int4)(1) > (int4)(0); // b=[-1,-1,-1,-1] 

When checking on 4 compilers, this time everyone gave the correct result.

Ternary operator for vectors


Ternary operator of the form “ exp1 ? expr2 : expr3 exp1 ? expr2 : expr3 "also behaves similarly for scalars and vectors in different ways. For scalars, as in C99, the result of the expression is expr2 if expr1 non-zero and exp3 if expr1 zero.

For vectors first, the type expr1 can only be integer. Secondly, when checking the condition in expr1 check does not go to equality to zero, or even to the first bit, but to the most significant bit . In this case, the operator works componentwise. If one of the expressions expr2 and expr3 is a vector and the other is a scalar, then the scalar is implicitly converted to a vector type with the corresponding components.

 int a = 1 ? 1 : 0; // a=1 int4 b = (int4)(1, 0, 1, 0) ? (int4)(1) : 0; // b=[0,0,0,0] int4 c = (int4)(-1, 0, -1, 0) ? 1 : (int4)(0); // c=[1,0,1,0] int4 d = (uint4)(0x80000000u, 0, 0, 0) ? (int4)(1) : (int4)(0); // d=[1,0,0,0] //  C99  ,     OpenCL     float e = 0.0f ? 1 : 2; //  , expr1     float4 f = (float4)(0) ? (float4)(1) : (float4)(2); // expr2  expr3     float4 g = (int4)(-1, 1, -1, 1) ? (float4)(1) : (float4)(0); // g=[1.0f, 0.0f, 1.0f, 0.0f] 

As you can see, here you can again fall into the trap of similarity. Compare the same code, accurate to vectorization:

 int a = 1 ? 1 : 0; // a=1 int4 b = (int4)(1) ? (int4)(1) : (int4)(0); // b=[0, 0, 0, 0] 

Vector b filled with zeros, in full accordance with the specification and to the confusion of the C programmers.

Convert real and integer types to OpenCL C


For scalar types, conversions from integer types to real and from real to integer are performed according to the same rules as in C99 — that is, when converting from a real number to an integer, the fractional part is discarded; when converting from an integer to a real number, a real number with same value as the original integer. If the number does not fit into the range of the type to which the conversion occurs - the result depends on the implementation.

If you need to interpret data of one type as data of another, then the only way to do this in C99 is to use the memcpy function. In OpenCL, there is no memcpy , but unlike C99, it is absolutely legal to use union 's to interpret data as data of another type:

For example, the use of a language is a branch of a different type.

On top of that, vector types and iron capabilities for saturation operations are supported - this dictates the features of type conversion in OpenCL.

The following types of type conversions are supported:

  1. Implicit Conversions;
  2. explicit cast (Explicit Casts);
  3. Explicit Conversions
  4. interpretation of data as data of another type (Reinterpreting Data As Another Type).

In OpenCL, points 1 and 2 are similar to C99, points 3 and 4 are innovations for the convenience and clarity of working with vector types.

Implicit conversions and C type explicit casts


As in C99, if there are operands of different types in the expression, they are converted to the same general type. The difference is how it works for vectors. For scalar types, implicit type conversion and explicit type conversion are supported in the same way as in C99:

 float a = 5.1f; int b = a; // b = 5 float c = 1.6f; int d = (int)c; // d = 1 

When explicitly or implicitly converting from a scalar type to a vector, the scalar is first converted to the type of a vector element according to the rules similar to C99, and then multiplied to
size vector type:

 float a = 4.7f; float4 b = 5; // int 5 → float 5.0f → (float4)(5.0f, 5.0f, 5.0f, 5.0f) int4 c = 4.7f; // float 4.7f → int 4 → (int4)(4, 4, 4, 4) int4 d = 1; // int 1 → (int4)(1, 1, 1, 1) int4 e = (int4) a;// float 4.7 → int 4 → (int4)(4, 4, 4, 4)      float4 f = a; // float 4.7f → (float4)(4.7f, 4.7f, 4.7f, 4.7f) 

Implicit conversion and explicit coercion in the C style of one vector type into another is prohibited. Even if they have the same number of components.

 float4 a = (float4)(5.0f, 5.0f, 5.0f, 5.0f); //   ,     int4 b = a; // ,        float4 c = 0; int4 d = (int4)c; // ,        int4 e = (int4)(c); // ,      —      float4  int4 int4 f = (int4)(c.xy, c.zw); // ,      —      float2  int2 int4 g = (int4)(cx, cy, cz, cw); //      ,       float  int 

There is no explicit reduction of vector types, however, a scalar can be reduced to a vector type. This adds additional confusion to vector literals. Compare three ways to set a vector with the same components:

 float2 a = (float2)(1); //   float2 b = (float2)1; //       float2 c = 1; //       

For vectors with different components, the same code does not work, you only need to use a vector literal. What's the worst, all the code below compiles perfectly, just the results will be relevant:

 float2 a, b, c, d; //    a = (float2)(1, 2); // a=[1, 2] // 1       , 2 —  b = (float2)1, 2; // b=[1, 1] // 1       , 2 —  c = 1, 2; // c=[1, 1] // 1 , 2        d = (1, 2); // d=[2, 2] 

Explicit conversion of real and integer types


In addition to C-style type casting, OpenCL has a type-casting mechanism that handles overflow situations and works with vectors. This is a family of functions.

 convert_() 

and more general functions

 convert_<_sat><_>() 

which additionally take overflow mode and rounding mode. For scalars and vectors, functions work in the same way. The number of elements in the vectors of the original and the resulting types must match.

 float a = 5.5f; int b = convert_int(a); // b = 5 float4 c = a; // c=[5.5, 5.5, 5.5, 5.5] float2 d = convert_float2(c); // ,           //         //        int4 e = convert_int4(c); // e=[5,5,5,5] 

When casting to integer types, overflow behavior is determined by the optional
modifier _sat . Without it, the overflow of the integer type occurs as usual in C99, with it the saturation works, the values ​​outside the range allowed by the range are reduced to the closest possible value represented in the converted type:

 int a = 257; uchar b = convert_uchar(a); // b = 1,   b = convert_uchar_sat(a); // b = 255,   

When casting to real types, the use of _sat not allowed. This is not necessary, because with the overflow of real types, they already become ± INF.

To control the rounding, the _rte modifiers are provided: _rte (round to nearest even), _rtz (round to zero), _rtp (round to _rtp infinity) and _rtn (round to negative infinity), which designate rounding to the nearest integer, rounding to zero, rounding to plus infinity and rounding to minus infinity respectively. In the absence of a rounding modifier, _rtz used to convert from real to integer and _rte when converting from integer to real. In _rte , not the usual mathematics is used, but the so-called “ bank ” version of rounding to the nearest integer. When the fractional part is exactly 0.5, then there is no one nearest integer number, the even one is chosen from the two nearest ones.

 int a = convert_int_rtp(4.2f); // a = 5 a = convert_int(4.2f); // a = 4 int4 b = convert_int4_rte((float4)M_PI_F); // b = [3, 3, 3, 3] 

Converting float to int with different rounding modes (tested here ):

0.5
-0.5
1.1
-1.1
1.5
-1.5
1.7
-1.7
Rounding to the nearest integer
(round to nearest even, rte)
0
0
one
-one
2
-2
2
-2
Rounding to zero
(round towards zero, rtz)
0
0
one
-one
one
-one
one
-one
Round to plus infinity
(round toward positive infinity, rtp)
one
0
2
-one
2
-one
2
-one
Rounding to minus infinity
(round toward negative infinity, rtn)
0
-one
one
-2
one
-2
one
-2

In the English-language article about rounding on Wikipedia there is a wonderful illustration . The rte mode on it corresponds to “even”, rtz - “round → zero”, rtp - “round up”, rtn - “round down”.

Interpreting data as another type of data


To interpret data of one type as data of another type in OpenCL, there is, in addition to union 's, a family of functions as_() for scalars and vectors:

 float a = 25.0f; int b = as_int(a); // b=0x41C80000,     25.0f 

If the size in bytes of the original and the new types do not match, then the as_ should cause a compilation error:

 int a = 0; char b = as_char(a); //, sizeof(int)!=sizeof(char) float2 c = 0; float8 d = as_float8(c); //, sizeof(float2)!=sizeof(float8) 

If the number of elements in the source and new types does not match (but the sizes of the types are the same), the result depends on the implementation of OpenCL (implementation-defined), except for the case when the operand is a 4-component vector, and the result is a 3-component vector. So, it is convenient to get the bytes of a 32-bit word as elements of a vector:

 uint word = 0x01020304; uchar4 bytes = as_uchar4(word); 

But the result can be both [4, 3, 2, 1] , and [1, 2, 3, 4] , and anything, at the discretion of a specific implementation of OpenCL. However, when optimizing and working on any one version of OpenCL, such use of as_ is quite acceptable.

If the operand is a 4-component vector, and the result is a 3-component vector, then
as_ must return the bits of the original type unchanged - according to the standard
the sizes of the vectors of the three components are equal to the size of the vectors of the four components, if the sizes of their elements are the same.

 float4 a = 1.0f; int3 b = as_int3(a); // ,   sizeof(int3)==sizeof(float4) // b=[0x3f800000, 0x3f800000, 0x3f800000] char3 c = as_char3(a); // , sizeof(char3)!=sizeof(float4) 

Conclusion


OpenCL C is cunning in its similarity to the usual C99. I hope after reading this article you

Source: https://habr.com/ru/post/345984/


All Articles