By writing this article, I was motivated by the high prevalence of some misconceptions about artificial neural networks (INS), especially in the area of ​​ideas that they can and can not, well, I would like to know how much ANN issues are relevant here, is it worth that or discuss in more detail.
I want to consider several well-known ANN architectures, give the most general (in consequence of which it is not always absolutely accurate) information about their structure, describe their strengths and weaknesses, and also outline perspectives.
I'll start with the classics.
')
Multilayer perceptron
The most famous and very old architecture, in which several layers of neurons go in succession - the input, one or several hidden layers, and the output layer. Almost always learns the method of back propagation of error - which automatically means that we must provide for training a set of pairs of “input vector - correct output”. Then the input vector will go to the input of the network, the states of all intermediate neurons will be calculated successively, and the output vector will form the output vector, which we will compare with the correct one. The discrepancy will give us an error that can be propagated back through the network connections, calculate the contribution to the final error of each neuron, and adjust its weights to correct it. By repeating this procedure many thousands of times, it may be possible to train the network.
This type of network usually does a very good job, where:
1. The answer really depends only on what we give to the input of the network, and in no way depends on the history of the inputs (that is, it is not a dynamic process, or, at least, we gave to the input comprehensive information about this process in the form suitable for processing network).
2. The answer does not depend / weakly depends on high degrees and / or products of parameters - the network is almost unable to build functions of this type.
3. There are many examples available (it is desirable to have at least a hundred examples on each network connection), or you have a lot of experience in dealing with the effect of specialization. This is due to the fact that having many coefficients, the network can tritely remember many concrete examples, and provide excellent results for them - but its predictions will have nothing to do with reality if you give examples of non-training samples for input.
Strengths - studied from all sides, works well on its tasks, if it does not work on some task (it really does not work, and not according to curvature, as is often the case) - then this is the reason to say that the task is more complicated than it seemed.
Weaknesses - the inability to work with dynamic processes, the need for a large training sample.
Prospects - no significant. Most of the serious tasks that still need to be solved are not included in the class of problems solved by the multilayer perceptron using the back propagation error method.
Recurrent perceptron
At first glance, it looks like an ordinary perceptron, the only significant difference is that its outputs fall into the inputs, and participate in the processing of the next input vector. That is, in the case of a recurrent perceptron, not a set of separate, unrelated images takes place, but a certain process, and not only the inputs themselves matter, but also the sequence in which they arrive. Because of this, there are differences in the method of training - the same reverse error propagation is used, but in order for the error to hit the recurrent connection in the past, different tricks are used (if you approach the problem "in the forehead", then the problem of leaving the error on an infinite number of cycles ago). Otherwise, the situation is similar to a normal perceptron - for training you need to have a sufficiently long sequence of input-output pairs, which you need to drive through the network many times to train it (or have a mat. Model of the desired process at hand, which can be driven in all possible conditions , and in real-time to give the results of the network for training).
A network of this type usually solves well the problems of managing dynamic processes (starting from the classical problem of stabilizing an inverted pendulum, and up to any systems that are generally managed to manage at all), predictions of dynamic processes other than the exchange rate :), and in general everything where Obviously observable input system has some internal state, which is not entirely clear how to use.
Strengths: the network is very good for working with dynamic processes
Weaknesses: if it still does not work, it’s very difficult to understand what the problem is, in the process of learning it can fly into self-excitation (when the signal received from the output clogs everything that comes through the inputs), if the solution is received, it’s hard to understand is it possible to achieve better results, and in what way. In other words, poorly understood.
Prospects: this approach is clearly not exhausted in management issues - in fact, at the moment recurrent perceptrons are used quite rarely, although their potential is high. Interesting results can come from an approach that continuously adapts to the network management object, although this still requires solving the problem of learning instability.
Associative memory
This is a wide class of networks that in one degree or another resemble the Hopfield architecture, which consists of a single layer of neurons, the outputs of which arrive at its inputs at the next moment in time. This layer serves as the network input (at the initial moment, the outputs of the neurons are assumed to be equal to the input vector), and its output — the values ​​on the neurons formed at the end of the work are considered to be the network response. This network changes its state over time until the state ceases to change. The properties of the weight matrix are chosen in such a way that the steady state is always guaranteed to be achieved (and usually this happens in a few steps). Such a network remembers a certain number of vectors, and when applying to the input of any vector, it can determine which of the memorized ones it looks like the most - hence the name. A two-layer modification of this network (heteroassociative memory) can memorize vectors not one by one, but in pairs of different dimensions.
Networks of this type do a good job with tasks where you need to determine the similarity of a vector to one of the standard ones stored. Actually, this is the only class of problems where they are good. Also specifically, the Hopfield network can be used to solve optimization problems (for example, the task of the partner), but its effectiveness in this area is questionable.
Strengths - very fast learning (because instead of a gradient descent, a system of equations is solved), the ability to remove an image from memory or add it to memory without affecting the others, some properties of such memory resemble the properties of the brain, and studying them is interesting from this position.
Weaknesses - a very narrow class of problems to be solved, inability to generalize examples, the maximum amount of memory is strictly related to the dimension of the memorized vector (due to the construction features).
Perspectives:
- nuclear (from the word kernel) associative memory was developed, which is capable of generalizing images, and has an unlimited amount of memory (the network grows as it is filled).
- a dynamic associative memory has been developed, which remembers not individual images, but certain sequences of images, and therefore can be used to recognize elements of dynamic processes.
- dynamic associative memory demonstrates the ability to generate a response that contains different elements of the stored sequences when the input signal is fed, corresponding to simultaneously different sequences, which may be some coarse model of human creativity.
- A hybrid of nuclear and dynamic associative memory can give a new quality in the recognition of sequences - for example, in speech recognition.
Spike nets
This is a special class of networks in which a signal is represented not by a real number, as in all previously considered networks, but by a set of pulses (spikes) of the same amplitude and duration, and the information is not contained in the amplitude, but in the intervals between pulses, in their pattern. Spike neurons at the output generate spikes, either single (if the total input is not very large), or packets (if the total input is large enough). This type of network almost completely copies the processes taking place in a person’s brain, the only serious difference is that there is nothing better for learning than Hebb’s rules (which goes something like this: if the second neuron worked right after the first one, the connection from the first to the second is enhanced, and if before the first one, it weakens), for which a number of small improvements were invented, but, unfortunately, it was not possible to repeat the properties of the brain in the field of education.
Networks of this type are able to adapt for solving various problems solved by other networks, but rarely the results turn out to be significantly better. In most cases, it is only possible to repeat what has already been achieved.
Strengths: very interesting to study as a model of biological networks.
Weaknesses: almost any practical application looks unreasonable, networks of other types handle just as well.
Prospects: modeling of large-scale spike nets in the coming years will probably give a lot of valuable information about mental disorders, will allow to classify normal and abnormal modes of operation of various brain regions. In the longer term, after creating a suitable learning algorithm, such networks will become equal in functionality or even surpass other types of neural networks, and later on they will be able to assemble structures suitable for direct connection to the biological brain to expand the capabilities of the intellect.
PS I deliberately did not affect the Kohonen network and similar architectures, since I can not say anything new about them, and on this topic there is already an excellent article:
habrahabr.ru/blogs/artificial_intelligence/51372UPD: and here there is an excellent article about convolutional networks, the essence of which is to train a set of cores with which the image is folded, and apply several layers of such filtering successively:
habrahabr.ru/blogs/artificial_intelligence/74326