OpenCL under C # is easy

Although the technology OpenCL appeared back in 2008, it has not received much distribution until now. The advantages of technology are undoubted: acceleration of calculations, cross-platform, the ability to execute code under both the GPU and the CPU, support for the standard by a number of companies: Apple, AMD, Intel, nVidia and some others. There are not so many minuses, but they are also there: slower work on nVidia than through CUDA, complexity of use. The first of the minuses affects only serious development, where the speed of the program is more important than cross-platform. The second is the main obstacle for developers who make a choice in favor of a particular development method. It takes a lot of time to figure out a bunch of headers, drivers, and standards. But it is not all that bad. The topic of this article will be a short guide on how the easiest way you can run OpenCL under C # and get pleasure from parallel programming.

Driver setup

The longest and most difficult part is setting up drivers for OpenCL. I will say right away that if the system has been on the computer for more than a year, you have an AMD card and the drivers in the system have been updated - it is likely that the system will have to be rearranged.

AMD

Although AMD has some driver problems, they have a great SDK with a lot of examples, and the support forums are live and operational. The latest versions of AMD Catalyst drivers for OpenCL are installed automatically. SDK for work can be found here . Before installing, check if your video card supports OpenCL. For AMD, partial support starts from the 4300 series, and full from 5400.

nVidia.

In principle, the problems with the drivers at nVidia less, but sometimes they also get up. OpenCL support comes from about the 8600 series. You can get the driver here .

Intel

Unlike AMD and nVidia, I have never had a problem with Intel drivers. You can take them here .

Installing drivers on the system does not guarantee that they will work for you. The number of bugs from AMD goes off scale (problems arose when installed on 2 of 3 computers), in nVidia it is large (1 of 3). Therefore, after installation, I recommend that you first check whether OpenCL is connected. The most simple it can be done through programs showing the parameters of video cards. I use GpuCapsViewer , opencl-z and GPU-Z also work.
If problems arise ... Remove all old drivers, rearrange. For AMD, make sure you install the correct driver version. Laptop drivers are often buggy for them. If the problems persist, reinstalling Windows will save you.
')

Wrappers

Since our goal is as simple as possible programming on OpenCL under C #, we will not engage in perversions and connect OpenCL headers, but use ready-made libraries that simplify development. The most complete and bug-free version, it seems to me, today is cloo.dll , which is part of OpenTK . The easiest to use, automate many operations is the OpenCLTemplate , which is a superstructure over cloo. Of the minuses of the latter - some glitches when working with AMD, for example, with the latest version of drivers (11.6), devices may refuse to initialize. Since the project is OpenSource, the glitches that I had I found and corrected, but when I release a new version of the library, I don’t know. There are also some lesser-known wrappers that can be found on the Internet.

First program

As the first program, let's count through OpenCL the sum of two vectors, v1 and v2. An example of a program written using cloo.dll:

Cloo program

private void button4_Click( object sender, EventArgs e) { // , . Platforms[1] // ComputeContextPropertyList Properties = new ComputeContextPropertyList(ComputePlatform.Platforms[1]); ComputeContext Context = new ComputeContext(ComputeDeviceTypes.All, Properties, null , IntPtr .Zero); // , (GPU CPU). // . , C99 OpenCL. string vecSum = @" __kernel void floatVectorSum(__global float * v1, __global float * v2) { int i = get_global_id(0); v1[i] = v1[i] + v2[i]; } " ; // , vecSum List <ComputeDevice> Devs = new List <ComputeDevice>(); Devs.Add(ComputePlatform.Platforms[1].Devices[0]); Devs.Add(ComputePlatform.Platforms[1].Devices[1]); Devs.Add(ComputePlatform.Platforms[1].Devices[2]); // vecSum ComputeProgram prog = null ; try { prog = new ComputeProgram(Context, vecSum); prog.Build(Devs, "" , null , IntPtr .Zero); } catch { } // ComputeKernel kernelVecSum = prog.CreateKernel( "floatVectorSum" ); // , . float [] v1 = new float [100], v2 = new float [100]; for ( int i = 0; i < v1.Length; i++) { v1[i] = i; v2[i] = 2 * i; } // . ComputeBuffer< float > bufV1 = new ComputeBuffer< float >(Context, ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer, v1); ComputeBuffer< float > bufV2 = new ComputeBuffer< float >(Context, ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer, v2); // vecSum kernelVecSum.SetMemoryArgument(0, bufV1); kernelVecSum.SetMemoryArgument(1, bufV2); // . , ! ComputeCommandQueue Queue = new ComputeCommandQueue(Context, Cloo.ComputePlatform.Platforms[1].Devices[0], Cloo.ComputeCommandQueueFlags.None); //. Execute - vecSum (v1.Length) Queue.Execute(kernelVecSum, null , new long [] { v1.Length }, null , null ); // . float [] arrC = new float [100]; GCHandle arrCHandle = GCHandle.Alloc(arrC, GCHandleType.Pinned); Queue.Read< float >(bufV1, true , 0, 100, arrCHandle.AddrOfPinnedObject(), null ); } * This source code was highlighted with Source Code Highlighter .

And below is the same program implemented through OpenCLTemplate.DLL

private void btnOpenCL_Click( object sender, EventArgs e) { // , (GPU CPU). // . , C99 OpenCL. string vecSum = @" __kernel void floatVectorSum(__global float * v1, __global float * v2) { int i = get_global_id(0); v1[i] = v1[i] * v2[i]; }" ; // . . GPU. //OpenCLTemplate.CLCalc.InitCL(Cloo.ComputeDeviceTypes.All) //GPU CPU. OpenCLTemplate.CLCalc.InitCL(); // . List <Cloo.ComputeDevice> L = OpenCLTemplate.CLCalc.CLDevices; // OpenCLTemplate.CLCalc.Program.DefaultCQ = 0; // vecSum OpenCLTemplate.CLCalc.Program.Compile( new string [] { vecSum }); // , . OpenCLTemplate.CLCalc.Program.Kernel VectorSum = new OpenCLTemplate.CLCalc.Program.Kernel( "floatVectorSum" ); int n = 100; float [] v1 = new float [n], v2 = new float [n], v3 = new float [n]; // , . for ( int i = 0; i < n; i++) { v1[i] = i; v2[i] = i*2; } // OpenCLTemplate.CLCalc.Program.Variable varV1 = new OpenCLTemplate.CLCalc.Program.Variable(v1); OpenCLTemplate.CLCalc.Program.Variable varV2 = new OpenCLTemplate.CLCalc.Program.Variable(v2); // , OpenCLTemplate.CLCalc.Program.Variable[] args = new OpenCLTemplate.CLCalc.Program.Variable[] { varV1, varV2 }; // int [] workers = new int [1] { n }; // VectorSum args workers VectorSum.Execute(args, workers); // varV1.ReadFromDeviceTo(v3); } * This source code was highlighted with Source Code Highlighter .

As you can see, the second option is much simpler and more intuitive.

Links at last

First of all this programming is devoted to programming on OpenCL through C # - www.cmsoft.com.br . Unfortunately, few people lead it, so the examples are often inadequate, and the OpenCLTemplate created by the authors of the site is very buggy.
A useful place is the site www.opentk.com where very quickly answer questions about programming through cloo.dll
Approximately the same answers can be obtained on sourceforge.net/projects/cloo
The C99 based OpenCl programming standard is described here - www.khronos.org/opencl

Continuing the article " Introduction to OpenCl " tells about the features of the programming language with which we program the video card.

Source: https://habr.com/ru/post/124873/

All Articles