Since the description of the first artificial neuron by Warren McCulloch and Walter Pitts, more than fifty years have passed. Since then, much has changed, and today neural network algorithms are used everywhere. And although neural networks are capable of much, researchers face a number of difficulties when working with them: from retraining to the black box problem.
If the terms “catastrophic forgetfulness” and “regularization of weights” do not mean anything to you, read on: we will try to understand everything in order.
/ Photo Jun / CC-SA')
What we love neural networks for
The main advantage of neural networks over other methods of machine learning is that they can recognize deeper, sometimes unexpected patterns in the data. In the process of learning, neurons are able to respond to the information received in accordance with the principles of generalization, thereby solving their task.
The areas where the networks find practical application already now include medicine (for example, cleaning instrument readings from noise, analyzing the effectiveness of the treatment), the Internet (associative information search), economics (forecasting exchange rates, automated trading), games (for example, go) and others. Neural networks can be used for almost anything because of their versatility. However, they are not a magic pill, and in order for them to start functioning properly, a lot of preliminary work is required.
Learning Neural Networks 101
One of the key elements of a neural network is the ability to learn. A neural network is an adaptive system that can change its internal structure based on the incoming information. Usually this effect is achieved by
adjusting the weights .
The connections between neurons on adjacent layers of a neural network are numbers that describe the significance of a signal between two neurons. If the trained neural network correctly responds to the input information, then there is no need to adjust the weights, otherwise you need to change the weights with the help of some learning algorithm, improving the result.
As a rule, this is done using the error back-propagation method : for each of the training examples, the weights are adjusted to reduce the error. It is believed that with a properly selected architecture and a sufficient set of training data, the network will sooner or later learn.
There are several fundamentally different approaches to learning, in relation to the task. The first is
training with a teacher . In this case, the input data are pairs: the object and its characteristics. This approach is used, for example, in image recognition: training is carried out on a marked base of pictures and manually placed labels of what is drawn on them.
The most famous of these databases is
ImageNet . With such a formulation of the task, learning is not much different from, for example, recognizing the emotions that the
Neurodata Lab is engaged
in . Networks are demonstrated examples, it makes an assumption, and, depending on its correctness, weights are adjusted. The process is repeated until the accuracy is increased to the desired values.
The second option is
learning without a teacher . Typical tasks for it are clustering and some formulations of the problem of searching for anomalies. In this scenario, the true labels of the training data are not available to us, but there is a need to look for patterns. Sometimes a similar approach is
used to pre-train the network in a learning task with a teacher. The idea is that the initial approximation for the scales is not a random solution, but one that already knows how to find patterns in the data.
Well, the third option -
training with reinforcements - a strategy built on observations. Imagine a mouse running through a maze. If she turns left, she will get a piece of cheese, and if she turns right, she will get an electric shock. Over time, the mouse learns to turn only to the left. The neural network acts in the same way, adjusting the weights, if the final result is “painful”. Training with reinforcements is actively used in robotics: “Did the robot hit the wall or was it unharmed?”. All the tasks related to games, including the most famous of them - AlphaGo, are based on reinforcement training.
Retraining: what is the problem and how to solve it
The main problem of neural networks is retraining. It lies in the fact that the network "remembers" the answers instead of catching patterns in the data. Science has contributed to the emergence of several methods to combat retraining: this includes, for example, regularization, normalization of batches, data accumulation, and others. Sometimes the retrained model is characterized by large absolute values ​​of weights.
The mechanism of this phenomenon is approximately as follows: the initial data are often very multidimensional (one point from the training sample is represented by a large set of numbers), and the probability that the point taken at random will be indistinguishable from the outlier will be the larger, the larger the dimension. Instead of “inscribing” a new point in the existing model, adjusting the weights, the neural network seems to come up with an exception to itself: we classify this point by one rule, and others by another. And there are usually a lot of such points.
An obvious way to deal with this kind of retraining is
regularization of weights . It consists either in an artificial restriction on the values ​​of the weights, or in adding a fine to the extent of an error at the training stage. Such an approach does not completely solve the problem, but most often improves the result.
The second method is based on limiting the output signal, and not the values ​​of the scales, - this is a matter of the
normalization of the batches . At the training stage, data is submitted to neural networks in batches. Output values ​​for them can be anything, and so their absolute values ​​are larger, the higher the values ​​of the weights. If we subtract one value from each of them and divide the result into another, the same for the whole batch, we will keep the qualitative ratios (maximum, for example, will remain maximum), but the output will be more convenient for processing it with the next layer.
The third approach does not always work. As already mentioned, the retrained neural network perceives many points as anomalous, which one wants to process separately. The idea is to
build up the training sample so that the points are as if they were of the same nature as the original sample, but were generated artificially. However, a large number of related problems immediately arise: the selection of parameters for increasing the sample, the critical increase in training time, and others.
The effect of removing the anomalous value from the training data set ( source )In a separate problem stands out the search for these anomalies in the training set. Sometimes it is even considered as a separate task. The image above demonstrates the effect of excluding an anomalous value from a set. In the case of neural networks, the situation will be similar. True, the search and elimination of such values ​​is not a trivial task. For this purpose, special techniques are used - you can read more about them on the links (
here and
here ).
One network - one task or “the problem of catastrophic forgetfulness”
Working in dynamically changing environments (for example, in financial) is difficult for neural networks. Even if you managed to
successfully train the network, there are no guarantees that it will not stop working in the future. Financial markets are constantly transforming, so what worked yesterday can just as well “break down” today.
Here, researchers either have to test various network architectures and choose the best of them, or use dynamic neural networks. The latter “watch” for changes in the environment and adjust their architecture in accordance with them. One of the algorithms used in this case is the MSO (
multi-swarm optimization ) method.
Moreover, neural networks have a certain feature called catastrophic forgetting. It boils down to the fact that a neural network cannot be consistently trained in several tasks — on each new training sample, all weights of neurons will be rewritten, and past experience will be “forgotten”.
Of course, scientists are working on solving this problem. Developers from DeepMind recently
proposed a way to combat catastrophic forgetfulness, which is that the most important weights in a neural network when performing a certain task A are artificially made more resistant to change in the learning process on task B.
The new approach is called Elastic Weight Consolidation (
elastic weights ) due to the analogy with the elastic spring. Technically, it is implemented as follows: each parameter in the neural network is assigned a parameter F, which determines its significance only within a specific task. The larger the F for a particular neuron, the more difficult it will be to change its weight when learning a new task. This allows the network to “memorize” key skills. The technology gave way to “highly specialized” networks in separate tasks, but it showed itself from the best side in terms of the sum of all stages.
Reinforced black box
Another difficulty with neural networks is that ANNs are actually black boxes. Strictly speaking, apart from the result, you
can’t pull anything out of the neural network,
not even statistics. It’s hard to understand how the network makes decisions. The only example where this is not the case is convolutional neural networks in recognition problems. In this case, some intermediate layers make sense of the feature maps (one connection indicates whether some simple pattern was encountered in the original image), so the excitation of various neurons can be traced.
Of course, this nuance makes it quite difficult to use neural networks in applications when errors are critical. For example, fund managers cannot understand how a neural network makes decisions. This leads to the fact that it is impossible to correctly assess the risks of trading strategies. Similarly, banks that resort to neural networks to simulate credit risks will not be able to say why this same client now has just such a credit rating.
Therefore, neural network developers are looking for ways to circumvent this limitation. For example, work is being done on so-called
rule-extraction algorithms to increase the transparency of architectures. These algorithms extract information from neural networks, either in the form of mathematical expressions and symbolic logic, or in the form of decision trees.
Neural networks are only a tool.
Of course, artificial neural networks are actively helping to develop new technologies and develop existing ones. Today, at the peak of popularity is the programming of unmanned vehicles, in which neural networks analyze the environment in real time. From year to year, IBM Watson discovers all new application areas, including
medicine . In Google, there is a whole
division that deals directly with artificial intelligence.
However, sometimes neural is - not the best way to solve the problem. For example, networks “
lag behind ” in such areas as the creation of high-resolution images, the generation of human speech and in-depth analysis of video streams. Working with symbols and recursive structures is also not easy for neural systems. This is also true for question-answer systems.
Initially, the idea of ​​neural networks was to copy and even recreate the mechanisms of brain functioning. However, humanity still needs
to solve the problem of the speed of the neural networks, to develop new algorithms for logical inference. Existing algorithms are at least 10 times less
than the capabilities of the brain, which is unsatisfactory in many situations.
At the same time, scientists still have not fully
decided on the direction in which neural networks should be developed. The industry tries to bring the neural networks as close as possible to the human brain model, and also to generate technologies and conceptual schemes abstracting from all “aspects of human nature”. Today it is something like an “open work” (if we use the term Umberto Eco), where almost any experiments are permissible, and fantasies are acceptable.
The activity of scientists and developers involved in neural networks requires deep preparation, extensive knowledge, the use of non-standard methods, since the neural network itself is not a “silver bullet” capable of solving any problems and tasks without human participation. This is a comprehensive tool that can do amazing things in skilled hands. And he still has everything ahead.