We accelerate the work of neural networks with the help of hashing

The industry has focused on accelerating matrix multiplication, but improving the search algorithm can lead to a more serious increase in speed.

In recent years, the computer industry has been busy trying to speed up the calculations required for artificial neural networks, both for learning and for drawing conclusions from its work. In particular, quite a lot of effort was put on the development of a special iron on which to perform these calculations. Google developed the Tensor Processing Unit , or TPU, first presented to the public in 2016. Later, Nvidia introduced the V100 Graphics Processing Unit, describing it as a chip specially designed for teaching and using AI, as well as for other high-performance computing needs. Full and other startups concentrating on other types of hardware accelerators .

Perhaps they all make a big mistake.

Such a thought was voiced in a work that appeared in mid-March on the arXiv website. In it, its authors, Beydi Chen , Tarun Medini and Anshumali Srivastava from Rice University, argue that, perhaps, the special equipment developed for the operation of neural networks is optimized for the wrong algorithm.
')
The thing is that the work of neural networks usually depends on how quickly the equipment can multiply the matrices used to determine the output parameters of each artificial neutron - its “activation” - for a given set of input values. Matrices are used because each input value for a neuron is multiplied by the corresponding weight parameter, and then all of them are added together - and this multiplication with addition is the basic operation of matrix multiplication.

Researchers from Rice University, like some other scientists, realized that the activation of many neurons in a particular layer of the neural network is too small, and does not affect the output value, calculated by subsequent layers. Therefore, if you know what these neurons are, you can simply ignore them.

It may seem that the only way to find out which neurons in the layer are not activated is to first perform all the operations of the matrix multiplication for this layer. But the researchers realized that in fact you can decide on this in a more efficient way if you look at the problem from a different angle. “We approach this issue as a solution to the search problem,” says Srivastava.

That is, instead of calculating matrix multiplications and watching which neurons are activated for given input data, you can simply see which neurons are in the database. The advantage of this approach in the problem is that you can use a generalized strategy that has long been improved by computer scientists to speed up the search for data in the database: hashing.

Hashing allows you to quickly check whether there is a desired value in a database table, without having to go through each row in sequence. You use a hash that is easily calculated by applying the hash function to the desired value, indicating where this value should be stored in the database. Then you can check only this one place to find out if this value is stored there.

Researchers did something similar for calculations related to neural networks. The following example will help illustrate their approach:

Suppose we created a neural network that recognizes handwritten input of numbers. Suppose the input data is gray pixels in a 16x16 array, that is, there are 256 numbers in all. We feed this data to one hidden layer of 512 neurons, the activation results of which are fed to the output layer of 10 neurons, one for each of the possible digits.

Tables from networks: before calculating the activation of neurons in hidden layers, we use hashes that help us determine which neurons will be activated. Here, the H1 input hash is used to search for the corresponding neurons in the first hidden layer - in this case, these will be neurons 2 and 4. The second hash H2 shows which neurons from the second hidden layer will contribute. This strategy reduces the number of activations that need to be calculated.

It is rather difficult to train such a network, but for the time being we will omit this moment and imagine that we have already adjusted all the weights of each of the neurons so that the neural network perfectly recognizes handwritten numbers. When a clearly written digit arrives at the input, the activation of one of the output neurons (corresponding to this digit) will be close to 1. The activation of the other nine will be close to 0. Classically, the work of such a non-network requires one matrix multiplication for each of the 512 hidden neurons, and one more for each weekend - which gives us a lot of multiplications.

Researchers take a different approach. The first step is to hash the weights of each of the 512 neurons in the hidden layer with the help of “locality sensitive hashing” hashing, one of the properties of which is that similar input data gives similar hash values. Then you can group neurons with similar hashes, which will mean that these neurons have similar sets of weights. Each group can be stored in a database, and determined by the hash of the input values, which will lead to the activation of this group of neurons.

After all this hashing, it is easy to determine which of the hidden neurons will be activated by some new input data. It is necessary to drive 256 input values through easily calculated hash functions, and use the result to search the database for those neurons that will be activated. In this way, it is necessary to calculate the activation values for only a few neurons that have a value. No need to count the activation of all other neurons in the layer only to find out that they do not contribute to the result.

The input to the input of such a neural network can be represented as the execution of a search query to the database, which asks to find all the neurons that would be activated during direct counting. You get the answer quickly because you are searching for hashes. And then you can simply calculate the activation of a small number of neurons that really matter.

The researchers used this technique, called SLIDE (Sub-LInear Deep Learning Engine), to teach the neural network for a process that has more computational requests than the use of the neural network for its intended purpose. They then compared the speed of the learning algorithm with a more traditional approach using a powerful GPU — specifically, the Nvidia V100 GPU. As a result, they got something amazing: “Our results show that on an average CPU, the SLIDE technology can work orders of magnitude faster than the best possible alternative implemented on the best hardware and with any accuracy.”

It is too early to draw conclusions as to whether these results (which have not yet been evaluated by experts) withstand the tests and will force chip manufacturers to take a different look at the development of special equipment for in-depth training. But the work definitely emphasizes the danger of dragging a certain type of iron in cases when there is a possibility of the emergence of a new and better algorithm for the operation of neural networks.

Source: https://habr.com/ru/post/447806/

All Articles

We accelerate the work of neural networks with the help of hashing

The industry has focused on accelerating matrix multiplication, but improving the search algorithm can lead to a more serious increase in speed.

More articles: