OpenCL. What is it and why is it needed? (if there is a CUDA)

Hello, dear Habra community.

Many have probably heard or read about the OpenCL - the new standard for developing applications for heterogeneous systems. Exactly, this is not a standard for developing applications for GPU, as many believe, OpenCL was originally conceived as something more: a single standard for writing applications that must be executed in a system with different processors, accelerators and expansion cards.

OpenCL prerequisites

The main place where you can meet heterogeneous systems are high-performance computing: from modeling physical processes in the boundary layer to video coding and rendering three-dimensional scenes. Previously, similar problems were solved using supercomputers or very powerful desktop systems. With the advent of NVIDIA CUDA / AMD Stream technology, it has become possible to relatively simply write programs that use the computing power of the GPU.

It should be noted that similar programs were created before, but it was NVidia CUDA that ensured the growth of popularity of GPGPU by facilitating the process of creating GPGPU applications. The first GPGPU applications as kernels (kernel in CUDA and OpenCL) used shaders, and the data was packed into textures. So you had to be familiar with OpenGL or DirectX. A little later, the Brook language appeared, which simplified the programmer's life a bit (AMD Stream was created on the basis of this language (it uses Brook +)).

')
CUDA began to gain momentum, and meanwhile (or rather earlier) in the forge located deep underground, at the foot of Mount Fuji, Japanese engineers forged ~~the absolute power~~ Cell (it was born in collaboration with IBM, Sony and Toshiba). Currently, Cell is used in all supercomputers supplied by IBM, based on it built the most productive supercomputers in the world (according to top500). Just less than a year ago, Toshiba announced the release of the SpursEngine PC expansion card to speed up video decoding and other demanding operations using Computing Units (SPE) designed for Cell. Wikipedia has an article briefly describing SpursEngine and its differences from Cell.
At about the same time (about a year ago), the S3 Graphics (in fact, VIA) also revived, presenting to the public its new graphics adapter S3 Graphics Chrome 500. According to the company itself, this adapter is also able to speed up all sorts of calculations. Included with it comes a software product (graphics editor) that uses all the delights of such acceleration. Description of the technology on the manufacturer's website .

So, what we have: the machine on which the calculations are performed can contain x86, x86-64, Itanium, SpursEngine (Cell) processors, NVidia GPU, AMD GPU, VIA (S3 Graphics) GPU. For each of these types of processes there is its own SDK (well, except perhaps VIA), its own programming language and program model. That is, if you want your rendering engine or the 787 Boeing wing load calculation program to work on a simple workstation, BlueGene supercomputer, or a computer equipped with two NVidia Tesla accelerators, you will need to rewrite a large part of the program, since each platform is its architecture has a set of strict limitations.
Since programmers are lazy people, they don’t want to write the same thing for 5 different platforms with all the features and learn to use different software and models, and customers are greedy people and don’t want to pay for the program for each platform as a separate product and pay for training courses for programmers, it was decided to create a single standard for programs running in a heterogeneous environment. This means that the program, generally speaking, must be able to run on a computer in which both NVidia and AMD, Toshiba SpursEngine GPUs are installed simultaneously.

Solution to the problem

For the development of an open standard, it was decided to attract people who already have experience (very successful) in developing such a standard: the Khronos Group, in whose conscience OpenGL and OpenML already have and much more. OpenCL is a trademark of Apple Inc., as stated on the Khronos Group website: “OpenCL is a trademark of Apple Inc. Conformant products can be found here:
http://developer.apple.com/softwarelicensing/agreements/opencl.html . ” In development (and financing, of course), apart from Apple, IT movers and shakers participated such as AMD, IBM, Activision Blizzard, Intel, NVidia, etc. (full list here ).
NVidia did not particularly advertise its participation in the project, and rapidly increased the functionality and performance of CUDA. In the meantime, several leading NVidia engineers participated in the creation of OpenCL. Probably, the participation of NVidia to a large extent determined the syntactic and ideological similarity of OpenCL and CUDA. However, programmers only benefited from this - it will be easier to switch from CUDA to OpenCL if necessary.

The first version of the standard was published at the end of 2008 and since then it has already undergone several revisions.

Almost immediately after the standard was published, NVidia stated that OpenCL support would have no difficulty for it and will soon be implemented within the GPU Computing SDK over the CUDA Driver API. Nothing like that from the main competitor NVidia - AMD was not heard.
The driver for OpenCL was released by NVidia and was tested for compatibility with the standard, but is still available only to a limited number of people - registered developers (anyone can submit an application for registration, in my case the review took 2 weeks, after which an invitation came by mail) . Restrictions on access to the SDK and drivers make us think that there are currently some problems or errors that cannot be fixed yet, that is, the product is still in beta testing.
Implementing OpenCL for NVidia was a fairly easy task, since the basic ideas are similar: both CUDA and OpenCL are some extensions of the C language, with a similar syntax, using the same programming model as the main one: Data Parallel (SIMD), also OpenCL supports Task Parallel programming model - a model when different kernels can be running at the same time (the work-group contains one element). On the similarity of the two technologies, even NVidia has released a special document on how to write for CUDA so that later it is easy to switch to OpenCL.

How are things at the moment

The main problem with the implementation of OpenCL from NVidia is poor performance compared to CUDA, but with each new driver release, OpenCL performance under CUDA is getting closer to the performance of CUDA applications. According to the developers, the performance of the CUDA applications themselves has gone the same way - from relatively low on the early versions of the drivers to the impressive ones now.

And what did AMD do at this moment? After all, AMD (as a proponent of open standards - closed PhysX vs. open Havoc; expensive Intel Thread Profiler vs. free AMD CodeAnalyst) made big bets on the new technology, given that AMD Stream could not even compete in popularity with NVidia CUDA - due to the lag Stream from CUDA in technical terms.
In the summer of 2009, AMD made a statement about support and compliance with the OpenCL standard in the new version of the Stream SDK. In fact, it turned out that support was implemented only for the CPU. Yes, exactly, it doesn’t contradict anything - OpenCL is a standard for heterogeneous systems and nothing prevents you from running the kernel on the CPU, moreover - this is very convenient if there is no other OpenCL device in the system. In this case, the program will continue to work, only more slowly. Or you can use all the computing power that is in the computer - both the GPU and the CPU, although in practice this does not make much sense, since the execution time of kernels that are executed on the CPU will be much longer than those that are executed on the GPU - speed processor will become a bottleneck. But for debugging applications it is more than convenient.
OpenCL support for AMD graphics adapters was also not long in coming - according to the latest reports from the company, the version for graphics chips is now at the stage of confirming compliance with the specifications of the standard. After that, it will be available to everyone.
Since OpenCL should work on top of some iron-specific shell, which means that this standard can truly become the same for various heterogeneous systems - it is necessary that the corresponding shells (drivers) are released for both IBM Cell and Intel Larrabie. So far nothing has been heard from these IT giants, so OpenCL remains another development tool for the GPU along with CUDA, Stream and DirectX Compute.

Apple also announces support for OpenCL, which, however, is provided by NVidia CUDA.
Also currently, third-party developers are offered:

OpenTK is a wrapper library over OpenGL, OpenAL and OpenCL for .Net.
PyOpenCL - wrapper over OpenCL for Pyton.
Java wrapper for OpenCL.

Conclusion

The OpenCL technology is of interest for various IT companies - from game developers to chip manufacturers, which means that it has great chances to become the de facto standard for developing high performance computing, having taken this title from the leading CUDA sector.

In the future, I plan a more detailed article about OpenCL itself, describing what this technology is, its features, advantages and disadvantages.
Thanks for attention.

Interesting links:

www.khronos.org/opencl - the OpenCL page on the Khronos Group site
www.nvidia.com/object/cuda_opencl.html - NVidia OpenCL (at the bottom of the page links to various documents: OpenCL Programming Guides, etc.)
forums.amd.com/devforum/categories.cfm?catid=390&entercat=y - AMD OpenCL Forum
habrahabr.ru/blogs/CUDA/55566 - a small review of OpenCL on habr
developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx - AMD Stream SDK v2.0 beta page
ati.amd.com/technology/streamcomputing/gpgpu_history.html - the history of the development of GPGPU according to AMD

Source: https://habr.com/ru/post/72247/

All Articles