How to speed up machine learning on a GPU - a new open source platform is introduced

Nvidia introduced the open source platform Rapids, whose task is to speed up the work of machine learning algorithms on the GPU. We talk about the features of the tool and its analogues.

/ photo by Martin Brigden CC

The problem of learning neural networks

Nvidia’s technology list has a parallel computing architecture called CUDA. Its goal is to speed up computations by transferring some GPU tasks instead of CPUs. In some cases, this allows you to speed up the work of applications and algorithms by 18 times.
')
For this reason, it has found widespread use in machine learning. For example, researchers from universities in Florida and North Carolina are developing with it the neural network engine for quantum simulations.

A large number of different libraries are used to develop MO algorithms. Many of them are written in Python . But not all of them support the work with CUDA. An example of such tools is the scikit-learn and pandas Python libraries for machine learning. To run Python code in the CUDA architecture, researchers use separate Numba or PyCUDA libraries. In this case, the code of some components must be rewritten manually, which can be difficult, since you need to know the programming features for the GPU.

Nvidia Solution

In order to automate code migration, Nvidia has introduced a new open platform, Rapids. Developers do not need to resort to different libraries: they simply write code in Python, and Rapids automatically optimizes it to run on the GPU.

Rapids uses a common database in the memory of the GPU to compare processes. Data is stored in the Apache Arrow format , uniform for all platform tools. This solution helps speed up the machine learning process 50 times compared to systems that use both graphics and central processing units.

At the same time, tools are available on the Rapids platform that can be used to conduct the entire process of working with neural networks on a graphics chip: from data preparation to outputting the result.

The number of solutions in the GitHub Rapids repository is actively growing. For example, there is a cuDF library for data preparation and neural network training, and the cuML library allows you to develop machine learning algorithms without going into details of programming for CUDA.

Nvidia will continue to develop the platform. The creators of the project plan to add tools for data visualization, graph analysis and deep learning in Rapids. The program also integrates the Apache Spark framework.

What do they think about the platform

In the technology community, the Rapids release was supported, but its further development caused several questions from experts and users.

For example, managers of Cisco, Dell, NetApp, Lenovo and other companies expressed support for the new solution. Anaconda CEO Scott Collison (Scott Collison) said that Rapids will simplify the collection and preparation of data for training complex AI systems. The creator of Apache Arrow and pandas Wes McKinney (Wes McKinney) agrees. According to him, Rapids will lead to an increase in productivity in tasks associated with the creation of features ( feature engineering ).

/ photo Sander van der Wel CC

However, the community also believes that Rapids cannot be considered a truly open source project. The system only works with Nvidia cards, and the release of the platform can be a marketing ploy to attract new customers. The company has not yet specified whether the platform will work with devices from other manufacturers.

Who already uses rapids

IBM plans to introduce the platform into services for working with artificial intelligence systems: PowerAI, Watson and IBM Cloud. Support for Rapids has also been announced by Oracle - the platform is available on the Oracle Cloud infrastructure.

The new product Nvidia also tested Walmart and Uber. The first Rapids helped to improve the algorithms of the system responsible for inventory management. According to representatives of the retailer, Rapids accelerated the scan of machine learning algorithms. As for Uber, the company uses the Rapids in the development of unmanned vehicle systems.

Alternative solution

Not only Nvidia is developing a platform to accelerate MO. For example, AMD is involved in the ROCm (Radeon Open Compute) project. This is an open platform for high performance GPU computing.

The peculiarity of ROCm is that it does not depend on the programming language and is able to work with almost any video card. Including with Nvidia cards. To do this, use a special C ++ dialect called HIP. It simplifies the conversion of CUDA applications to portable C ++ code. The code is converted automatically by a special Hipify system.

At the same time, ROCm supports a large number of Math Acceleration Libraries libraries. Among them are the BLAS, FFT, and the convolution of the tensor .

IT industry experts point out that open source solutions for accelerating heterogeneous and GPU computing, like ROCm and Rapids, allow developers to more efficiently use computing resources and get more performance from the available hardware.

PS Several materials from the First Corporate IaaS blog:

PPS IaaS-technologies in brief - in our Telegram-channel :

Source: https://habr.com/ru/post/429286/

All Articles