Sparse matrices: how scientists accelerated machine learning on the GPU

In early December, researchers from OpenAI presented a library of tools that will help accelerate the training of neural networks on the GPU from Nvidia through the use of sparse matrices. About the difficulties that the developers of neural networks face and what is the main idea of the solution from OpenAI, we will tell further.

/ photo alantankenghoe CC

The difficulties of training large neural networks on the GPU

Graphic processors (GPUs) are better suited for machine learning than central processing units (CPUs). Technical features help the GPU to simultaneously perform many matrix operations that are used to train neural networks.
')
To achieve a similar result on the central processor, it is necessary to build the infrastructure of several clusters of CPUs, which is very expensive. The Google system for training neural networks on a CPU cost about $ 5 billion. Today, scientists from Stanford have built a system with a similar computing power on the GPU for only 33 thousand dollars.

However, there are difficulties: use the full potential of the GPU in resource-intensive tasks is not so simple. For processing, the data must be stored in the memory of the GPU, but its volume is small , which makes it difficult to train large models. For example, the VGG-16 model requires about 14 GB, while the memory size of the Nvidia Titan X is 12 GB. And this map Nvidia is positioning as one of the most powerful GPU for deep learning.

As EvilGenius18 correctly noted in the comments, on December 7, Nvidia unveiled a new Titan V card on the Volta architecture. It has 110 TFLOPS computing capacity for deep learning tasks, which is 9 times more than its predecessor.

At the same time, for effective training of large models of neural networks, various approaches are used. One of them is the processing of data on the graphics processor in consecutive batches, when the CPU acts as a temporary container. The disadvantage of this approach is the use of resources for data transfer.

It is possible to use several GPUs simultaneously, but the number of GPUs on a single computer is limited, therefore a high-speed connection between computing systems is required. The intercomputer communication channel affects learning speed, since the machines in this case spend more time on “communication” than on calculations.

There is another solution that is used in machine learning for optimization, sparse matrices . These are matrices that mostly contain zero elements. The advantage is that zeros in matrix operations are perceived as empty components. Therefore, such matrices consume less GPU memory. This speeds up the machine learning process, which is important for large models.

But there is a problem: Nvidia solutions, the main supplier of GPUs, do not support working with sparse matrices. But OpenAI found a way out of this situation.

OpenAI solution

The OpenAI team has developed software that models the work of tiny kernels that can interact with such matrices. The kernels were tested on training networks analyzing reviews on Amazon and IMDB sites. According to the team, the level of errors in working with IMDB data has been reduced from 5.91% to 5.01%.

The kernels are implemented using CUDA , Nvidia's parallel computing software and hardware architecture. But the OpenAI model is currently available only for TensorFlow. Scott Gray, a member of the Open AI team, said the solution could be extended to other architectures besides Google TPU2. Nvidia already knows about the work of OpenAI and is ready to optimize its systems.

Alternative projects

The concept of sparse matrices was embodied in an open source compiler called Taco. About the project, which is working on a team of scientists from the Massachusetts Institute of Technology in partnership with Adobe Research, it became known in November. The developers were looking for a way to automate the process of processing numbers in sparse matrices. And used for this tensors .

About their developments in the field of machine learning in December, and reported the company IBM. The solution of the IT giant - DuHL - offers a new method of transferring data from the CPU to the GPU. The main task of the technology is to determine which information is most important for the learning algorithm and transmit it to the network in the correct order. Studies have shown that the new approach based on DuHL is 10 times faster than the classical method of serial data transfer between processors. The company's next goal is to offer DuHL as a service in the cloud.

But IBM is not the first to come up with transferring GPU computing to the cloud. Such projects, including those working on the IaaS model, are already known . Initially vGPU was provided by Nvidia. Both AMD and Intel are doing this now.

About OpenAI

OpenAI is a non-profit research organization founded by the head of Tesla Ilon Mask. It aims to promote and develop artificial intelligence for the benefit of mankind. The organization closely cooperates with other institutions and researchers, providing open access to its developments.

PS A few more materials from our corporate blog:

Source: https://habr.com/ru/post/344320/

All Articles

Sparse matrices: how scientists accelerated machine learning on the GPU

The difficulties of training large neural networks on the GPU

OpenAI solution

Alternative projects

About OpenAI

More articles: