Unnoticed in Habré there was a news about the appearance of OpenCL 2.0 drivers from
AMD and
Intel .
Many people think that this API is just another marketing buzzword. In part, this is so because almost all modern hardware products come out with the “OpenCL” item in the list of supported technologies and advertisements: latest CPU, GPU,
APU (CPU + GPU) , FPGA,
CPU + FPGA . And many of the enterprise software development areas want to disown these “fashionable” names, but this will soon become impossible with the
efforts of Oracle and AMD .
Mass parallelism of hardware has long been present in servers, personal computers,
phones and tablets , specialized hardware accelerators. OpenCL in the field of FPGA is
seen as a way to simplify, cheapen and popularize development. At the same time, the use of the advantages provided by the hardware requires the programmer to use such APIs as OpenCL, CUDA, OpenMP. But there are attempts to hide this complexity from application programmers, such as
Project Sumatra and
ScalaCL .
OpenCL has already made it possible to optimize the performance of the graphics editors of
Photoshop CC 2014 and
GIMP 2.8 RC 1 . The
LuxRender and
Cycles renderers from the Blender project also benefit from the use of this API. And even the office suite
LibreOffice uses OpenCL. It was very unexpected for me to learn from the news that my former employer distinguished himself in optimizing the
VP9 video encoder using OpenCL.
')
OpenCL 2.0 was approved as a standard almost a year ago. This standard supports technologies available in modern hardware: Shared Virtual Memory allows you to avoid unnecessary / explicit copying of data between memory areas, nested parallelism allows you to plan the execution of kernel functions on the device without converting the host program, reducing latency pipe as an additional method of exchanging data between kernel functions also extended support for atomic operations.
My heart warms me the most, that now I
don’t need crutches to implement atomic floating point operations in OpenCL. Strangely enough, this problem with atomic cmpxchg volatile led a lot of people to my blog and
other developers used the same approach before the appearance of this feature in the last standard.
You can continue to ignore the advantages provided by the hardware ... Or try to use modern hardware with greater efficiency and make a proof of concept for your project, if you can
parallelize the algorithm and the amount of computation in the project is large enough.
What do you think about the prospects for using OpenCL in enterprise software or your project?