Using readable and writable images in OpenCL 2.0

Before OpenCL 2.0, it was impossible to carry out read and write operations of the image within the same core. You could declare images as CL_MEM_READ_WRITE , but after transferring the image, the kernel had to choose one of two things : either __ read_only (read-only access) or __ write_only ( write -only access). OpenCL 2.0 has the ability to read and write images within the same core. However, there are several features that we will discuss in detail in this post.

input1 = clCreateImage( oclobjects.context, CL_MEM_READ_WRITE|CL_MEM_COPY_HOST_PTR, &format, &desc, &input_data1[0], &err ); SAMPLE_CHECK_ERRORS( err );

Code snippet 1. You could create an image buffer using CL_MEM_READ_WRITE

 __kernel void Alpha( __read_write image2d_t inputImage1, __read_only image2d_t inputImage2, uint width, uint height, float alpha, float beta, int gamma )

Code snippet 2. In OpenCL 2.0, it became possible to read and write images within a single core.

The advantages of images available for reading and writing

Convolving images is less effective when using the new functionality of reading and writing images, but other image processing algorithms can get significant benefits from this functionality. One example of such a process may be imaging.
')
In OpenCL 1.2 and earlier versions, only the __read_only and __write_only qualifiers could have an image. In OpenCL 2.0, the qualifier __read_write has appeared, the output can be copied to the input buffer. This reduces the number of required resources. For any image changes, you need to process the image as a buffer and work with this buffer (see cl_khr_image2d_from_buffer ).

The current solution is to process the images as buffers and manage the buffers. To process two-dimensional images as buffers, a certain amount of resources are required. It also becomes impossible to use the cut and filter capabilities available in read_images. Therefore, it is advisable to use images with the read_write qualifier.

Sample Overview

In the example, two bitmap images (input1.bmp and input2.bmp) are buffered. These images are then superimposed one on the other based on the alpha value (this is a weight factor in the pixel equation). The alpha value is passed as a parameter.

Figure 1. Alpha = 0.84089642

Input images must be 24- or 32-bit. The output is a 24-bit image. Input images must be the same size. Images were in ARGB format, this was taken into account when loading them.

Figure 2. Alpha = 0.32453

The ARGB format is converted to RGBA. Changing the beta value causes a significant change in the output image.
SDK usage
The SDK demonstrates image overlay using read and write. To control the operation of the sample code, you can use the following command line parameters.

Options	Description
-h, --help	Display this text and exit.
-p, --platform <number or string>	Selection of the platform whose devices are used.
-t, --type all \| cpu \| gpu \| acc \| default \| <OpenCL constant for device type>	Select the type of device on which the OpenCL core is running.
-d, --device <number or string>	Select the device on which all work is performed.
-i, --infile <24-bit or 32-bit input bmp file>	The name of the first readable file in BMP format. The default is input1.bmp .
-j, --infile <24-bit or 32-bit input bmp file>	The name of the second readable file in BMP format. The default is input2.bmp .
-o, --outfile <24-bit or 32-bit input bmp file>	The name of the output file to write to. The default is output.bmp for OCL1.2 and 20_output.bmp for OCL2.0.
-a, - alpha <floating-point value from zero to one>	A non-zero positive value that determines how much two images will be superimposed on each other when combined. The default alpha value is 0.84089642. The default beta value is 0.15950358.

Default values are set in the sample SDK, so the application can work without any input from the user. Users can use their own input BMP files. All files must be 24-bit or 32-bit. The alpha value determines how much the first image will overlap the second.

 calculatedPixel = ((currentPixelImage1 * alpha) + (currentPixeImage2 * beta) + gamma);

The beta value is equal to the difference between the unit and the alpha value.

 float beta = 1 – alpha;

These two values determine the "weight" of images 1 and 2 in the output image.
To change the brightness of each pixel, you can use the value of gamma. By default, this value is zero. The user can change the brightness of the finished image as a whole.

An example of running the program

Figure 3. Running the program on an OpenCL 2.0 device

Limitations of images available for reading and writing

Limiters cannot be used with images that require synchronization between different work groups. To convolve images, synchronization of all streams is required. Convolution with respect to images usually involves mathematical operations on two matrices and the creation of a third matrix as a result. The image convolution example uses Gaussian blur. Other examples use image sharpening, edge detection and embossing.

As an example, consider the Gaussian blur. A Gaussian filter is a low pass filter that removes high frequency values. As a result, the image detail level decreases and a blur effect is obtained. Applying a Gaussian blur is the same as transforming an image using a Gaussian distribution function (often referred to as a mask). To demonstrate the functionality of reading and writing images had to apply a blur horizontally and vertically.

In OpenCL 1.2, this would have to be done in two passes. One core would be used only for horizontal blur, and the other for vertical blur. The result of one blur will be used as input for the next (depending on which blur was the first).

 __kernel void GaussianBlurHorizontalPass( __read_only image2d_t inputImage, __write_only image2d_t outputImage, __constant float* mask, int maskSize) { int2 currentPosition = (int2)(get_global_id(0), get_global_id(1)); float4 currentPixel = (float4)(0,0,0,0); float4 calculatedPixel = (float4)(0,0,0,0); for(int maskIndex = -maskSize; maskIndex < maskSize+1; ++maskIndex) { currentPixel = read_imagef(inputImage, imageSampler, currentPosition + (int2)(maskIndex, 0)); calculatedPixel += currentPixel * mask[maskSize + maskIndex]; } write_imagef(outputImage, currentPosition, calculatedPixel); } __kernel void GaussianBlurVerticalPass( __read_only image2d_t inputImage, __write_only image2d_t outputImage, __constant float* mask, int maskSize) { int2 currentPosition = (int2)(get_global_id(0), get_global_id(1)); float4 currentPixel = (float4)(0,0,0,0); float4 calculatedPixel = (float4)(0,0,0,0); for(int maskIndex = -maskSize; maskIndex < maskSize+1; ++maskIndex) { currentPixel = read_imagef(inputImage, imageSampler, currentPosition + (int2)(0, maskIndex)); calculatedPixel += currentPixel * mask[maskSize + maskIndex]; } write_imagef(outputImage, currentPosition, calculatedPixel); }

Code snippet 3. Gauss blur kernel in OpenCL 1.2

In OpenCL 2.0, these two cores can be merged into one. Use the limiter to force the blur to finish horizontally or vertically before starting the next blur.

 __kernel void GaussianBlurDualPass( __read_only image2d_t inputImage, __read_write image2d_t tempRW, __write_only image2d_t outputImage, __constant float* mask, int maskSize) { int2 currentPosition = (int2)(get_global_id(0), get_global_id(1)); float4 currentPixel = (float4)(0,0,0,0); float4 calculatedPixel = (float4)(0,0,0,0) currentPixel = read_imagef(inputImage, currentPosition); for(int maskIndex = -maskSize; maskIndex < maskSize+1; ++maskIndex) { currentPixel = read_imagef(inputImage, currentPosition + (int2)(maskIndex, 0)); calculatedPixel += currentPixel * mask[maskSize + maskIndex]; } write_imagef(tempRW, currentPosition, calculatedPixel); barrier(CLK_GLOBAL_MEM_FENCE); for(int maskIndex = -maskSize; maskIndex < maskSize+1; ++maskIndex) { currentPixel = read_imagef(tempRW, currentPosition + (int2)(0, maskIndex)); calculatedPixel += currentPixel * mask[maskSize + maskIndex]; } write_imagef(outputImage, currentPosition, calculatedPixel); }

Code snippet 4. Gauss blur kernel in OpenCL 2.0

It turned out that the limiters are ineffective. The use of delimiters does not guarantee that the horizontal blur will be performed before the beginning of the vertical blur (if the first one was a horizontal blur). As a result, with several launches, different results were obtained. Limiters can be used to synchronize streams in a group. The cause of the problem is that edge pixels are being read from several workgroups, and there is no way to synchronize between several workgroups. The initial assumption about the possibility of implementing a single Gaussian blur using reading and writing images turned out to be incorrect, since in OpenCL it is impossible to synchronize data dependencies between working groups.

Source: https://habr.com/ru/post/268851/

All Articles

Using readable and writable images in OpenCL 2.0

The advantages of images available for reading and writing

Sample Overview

An example of running the program

Limitations of images available for reading and writing

More articles: