As it was written |
|
What the author wanted to write | |
How to write | |
get_global_id()
and everything in terms of language seems clear and primitive. Such a familiar C99 that you can easily fall into the trap and not notice it for the time being. Yes, OpenCL C is very similar to C, but there are some very useful differences, unfairly forgotten because there are no analogues in C99, and very insidious differences, hiding behind a similar syntax. ( )( )
(int2)(1,2);
// int2 a = (int2)(1, 2); // a=[1,2] // int2 b = (int2)(3, 4); // b=[3,4] // int4 c = (int4)(a, b); // c=[1,2,3,4] // int3 d = (int3)(1, c.xy); // d=[1,1,2] // ?! float2 e = (float2)(1); // e=[1.0f,1.0f]
(float2)(1)
and other examples above are not a type conversion, but a new construction (see 6.1.6 Vector Literals in the OpenCL 1.2 specification ). int2 coords = (get_global_id(0), get_global_id(1));
coords
is not a vector literal given, for a vector literal it was necessary to add a vector type: int2 coords = (int2)(get_global_id(0), get_global_id(1));
(get_global_id(0), get_global_id(1))
and this is a construction from ordinary C - in brackets the call of two functions through the operator " ,
" (comma), which means that both functions will be executed and the expression will return the result second function, as if we wrote: get_global_id(0); int2 coords = get_global_id(1);
coords
will be the vector [get_global_id(1), get_global_id(1)]
, rather than [get_global_id(0), get_global_id(1)]
, as expected.warning: expression result unused
”, but you should not count on it. // , float3 color = (0.5f, 0.5f, 0.5f); // color=[0.5f, 0.5f, 0.5f]
// , float3 color = (0.1f, 0.1f, 0.5f); // color=[0.5f, 0.5f, 0.5f]
// float3 color = (float3)(0.1f,0.1f, 0.5f); // color=[0.1f,0.1f, 0.5f]
int val = true; int2 val2 = true;
val
? What - in val2
?bool
value (and the bool
type and the constants true
and false
exist in C99 and in OpenCL C) false
becomes zero, and true
becomes unity. These are rules for scalars. Thus, in val
will be "1". This is not always convenient, but this behavior is embedded in the brain of a programmer — constructions like x+=(a>b)
no longer surprising.bool
return either integers with all bits in zero, or all bits in one, which corresponds to (int)-1
. Here is what the specification says on this topic (section 6.2.2 Explicit Casts ):It will be set to -1 (ie all bits set) if it is true.
val2
will be a vector [-1, -1]
. This is a little unexpected in the context of type conversion when the expression is first cast to the type of the vector component, and then multiplies - as for the other types, but for bool
exactly this behavior is declared. With proper use, it allows you to replace conditional expressions with bitwise operations. __kernel void bool_to_int_vec() { int val = true; int2 val2 = true; printf("int val = true; // val=%d\n", val); printf("int2 val2 = true; // val2=%v2d\n", val2); if(val2.x == -1 && val2.y == -1) { printf("Compiler follows specification for bool->intn conversion, OK\n"); } else { printf("Compiler does not follow specification for bool->intn conversion, FAILED\n"); } }
$ ./clrun ../kernels/bool_to_int_vec.cl
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Tonga
int val = true; // val=1
int2 val2 = true; // val2=-1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) HD Graphics
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
int val = true; // val=1
int2 val2 = true; // val2=1,1
Compiler does not follow specification for bool->intn conversion, FAILED
val2
vector of units, not -1....
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Ellesmere
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on AMD Accelerated Parallel Processing / Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
int val = true; // val=1
int2 val2 = true; // val2= 1,1
Compiler does not follow specification for bool->intn conversion, FAILED
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) HD Graphics 630
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on Intel(R) OpenCL / Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
int val = true; // val=1
int2 val2 = true; // val2= -1,-1
Compiler follows specification for bool->intn conversion, OK
...
Running "bool_to_int_vec" kernel on NVIDIA CUDA / GeForce GTX 1060 6GB
int val = true; // val=1
1,1
Compiler does not follow specification for bool->intn conversion, FAILED
printf()
.bool
to int
, the corresponding operators have different behavior for scalars and vectors. Values ​​of the result of the operators >
, <
, >=
, <=
, ==
!=
, &&
, ||
!
This is an int
. For scalars, 0 or 1. For vectors, a vector of the appropriate length from int
's with values ​​of 0 or -1 (all bits are set to 1). int a = 1 > 0; // a=1 int4 b = (int4)(1) > (int4)(0); // b=[-1,-1,-1,-1]
exp1 ? expr2 : expr3
exp1 ? expr2 : expr3
"also behaves similarly for scalars and vectors in different ways. For scalars, as in C99, the result of the expression is expr2
if expr1
non-zero and exp3
if expr1
zero.expr1
can only be integer. Secondly, when checking the condition in expr1
check does not go to equality to zero, or even to the first bit, but to the most significant bit . In this case, the operator works componentwise. If one of the expressions expr2
and expr3
is a vector and the other is a scalar, then the scalar is implicitly converted to a vector type with the corresponding components. int a = 1 ? 1 : 0; // a=1 int4 b = (int4)(1, 0, 1, 0) ? (int4)(1) : 0; // b=[0,0,0,0] int4 c = (int4)(-1, 0, -1, 0) ? 1 : (int4)(0); // c=[1,0,1,0] int4 d = (uint4)(0x80000000u, 0, 0, 0) ? (int4)(1) : (int4)(0); // d=[1,0,0,0] // C99 , OpenCL float e = 0.0f ? 1 : 2; // , expr1 float4 f = (float4)(0) ? (float4)(1) : (float4)(2); // expr2 expr3 float4 g = (int4)(-1, 1, -1, 1) ? (float4)(1) : (float4)(0); // g=[1.0f, 0.0f, 1.0f, 0.0f]
int a = 1 ? 1 : 0; // a=1 int4 b = (int4)(1) ? (int4)(1) : (int4)(0); // b=[0, 0, 0, 0]
b
filled with zeros, in full accordance with the specification and to the confusion of the C programmers.memcpy
function. In OpenCL, there is no memcpy
, but unlike C99, it is absolutely legal to use union
's to interpret data as data of another type:For example, the use of a language is a branch of a different type.
float a = 5.1f; int b = a; // b = 5 float c = 1.6f; int d = (int)c; // d = 1
float a = 4.7f; float4 b = 5; // int 5 → float 5.0f → (float4)(5.0f, 5.0f, 5.0f, 5.0f) int4 c = 4.7f; // float 4.7f → int 4 → (int4)(4, 4, 4, 4) int4 d = 1; // int 1 → (int4)(1, 1, 1, 1) int4 e = (int4) a;// float 4.7 → int 4 → (int4)(4, 4, 4, 4) float4 f = a; // float 4.7f → (float4)(4.7f, 4.7f, 4.7f, 4.7f)
float4 a = (float4)(5.0f, 5.0f, 5.0f, 5.0f); // , int4 b = a; // , float4 c = 0; int4 d = (int4)c; // , int4 e = (int4)(c); // , — float4 int4 int4 f = (int4)(c.xy, c.zw); // , — float2 int2 int4 g = (int4)(cx, cy, cz, cw); // , float int
float2 a = (float2)(1); // float2 b = (float2)1; // float2 c = 1; //
float2 a, b, c, d; // a = (float2)(1, 2); // a=[1, 2] // 1 , 2 — b = (float2)1, 2; // b=[1, 1] // 1 , 2 — c = 1, 2; // c=[1, 1] // 1 , 2 d = (1, 2); // d=[2, 2]
convert_()
convert_<_sat><_>()
float a = 5.5f; int b = convert_int(a); // b = 5 float4 c = a; // c=[5.5, 5.5, 5.5, 5.5] float2 d = convert_float2(c); // , // // int4 e = convert_int4(c); // e=[5,5,5,5]
_sat
. Without it, the overflow of the integer type occurs as usual in C99, with it the saturation works, the values ​​outside the range allowed by the range are reduced to the closest possible value represented in the converted type: int a = 257; uchar b = convert_uchar(a); // b = 1, b = convert_uchar_sat(a); // b = 255,
_sat
not allowed. This is not necessary, because with the overflow of real types, they already become ± INF._rte
modifiers are provided: _rte
(round to nearest even), _rtz
(round to zero), _rtp
(round to _rtp
infinity) and _rtn
(round to negative infinity), which designate rounding to the nearest integer, rounding to zero, rounding to plus infinity and rounding to minus infinity respectively. In the absence of a rounding modifier, _rtz
used to convert from real to integer and _rte
when converting from integer to real. In _rte
, not the usual mathematics is used, but the so-called “ bank ” version of rounding to the nearest integer. When the fractional part is exactly 0.5, then there is no one nearest integer number, the even one is chosen from the two nearest ones. int a = convert_int_rtp(4.2f); // a = 5 a = convert_int(4.2f); // a = 4 int4 b = convert_int4_rte((float4)M_PI_F); // b = [3, 3, 3, 3]
float
to int
with different rounding modes (tested here ):0.5 | -0.5 | 1.1 | -1.1 | 1.5 | -1.5 | 1.7 | -1.7 | |
Rounding to the nearest integer (round to nearest even, rte) | 0 | 0 | one | -one | 2 | -2 | 2 | -2 |
Rounding to zero (round towards zero, rtz) | 0 | 0 | one | -one | one | -one | one | -one |
Round to plus infinity (round toward positive infinity, rtp) | one | 0 | 2 | -one | 2 | -one | 2 | -one |
Rounding to minus infinity (round toward negative infinity, rtn) | 0 | -one | one | -2 | one | -2 | one | -2 |
rte
mode on it corresponds to “even”, rtz
- “round → zero”, rtp
- “round up”, rtn
- “round down”.union
's, a family of functions as_()
for scalars and vectors: float a = 25.0f; int b = as_int(a); // b=0x41C80000, 25.0f
as_
should cause a compilation error: int a = 0; char b = as_char(a); //, sizeof(int)!=sizeof(char) float2 c = 0; float8 d = as_float8(c); //, sizeof(float2)!=sizeof(float8)
uint word = 0x01020304; uchar4 bytes = as_uchar4(word);
[4, 3, 2, 1]
, and [1, 2, 3, 4]
, and anything, at the discretion of a specific implementation of OpenCL. However, when optimizing and working on any one version of OpenCL, such use of as_
is quite acceptable.as_
must return the bits of the original type unchanged - according to the standard float4 a = 1.0f; int3 b = as_int3(a); // , sizeof(int3)==sizeof(float4) // b=[0x3f800000, 0x3f800000, 0x3f800000] char3 c = as_char3(a); // , sizeof(char3)!=sizeof(float4)
Source: https://habr.com/ru/post/345984/
All Articles