Historically, artificial neural networks for their more than half a century of history have experienced both periods of rapid take-off and increased public attention, as well as periods of skepticism and indifference that have replaced them. In good times, scientists and engineers think that a universal technology has finally been found that can replace a person in any cognitive tasks. Like mushrooms after the rain, various new models of neural networks appear, between their authors, professional mathematicians, there are intense disputes about a greater or lesser degree of biology of their proposed models. Professional biologists observe these discussions from the side, periodically breaking down and exclaiming “Yes, there is no such thing in real nature!” - and without any special effect, because neurosetetic mathematicians listen to biologists, as a rule, only when the facts of biologists agree with own theories. However, over time, a pool of tasks is gradually accumulating, on which neural networks work frankly poorly and people's enthusiasm cools.
Nowadays, neural networks are again at the zenith of fame thanks to the invention of the “without a teacher” pre-training method based on the Restricted Bolzmann Machines, RBM, which allows you to train
deep neural networks (i.e. the number of neurons) and the success of deep neural networks in practical problems of recognition of oral speech [1] and images [2]. For example,
speech recognition in Android is implemented on deep neural networks. How long it will last and how deeply the neural networks will justify their expectations are unknown.
Meanwhile, parallel to all scientific disputes, trends and trends, the community of users of neural networks is clearly distinguished - software engineers and practitioners who are interested in the applied aspect of neural networks, their ability to learn from the collected data and solve recognition problems. With many practical problems of classification and forecasting, well-developed, relatively small models of multilayer perceptrons (Multilayer Perceptron, MLP) and a network of radial basic functions (Radial Basis Function network, RBF) do an excellent job. These neural networks are described many times, I would advise the following books, in order of my personal sympathy for them: Osovsky [3], Bishop [4], Haikin [5]; There are also
good courses on Coursera and similar resources.
However, with regard to the general approach of using neural networks in practice, it is fundamentally different from the usual deterministic developer approach “programmed, it works - it means it always works.” By their nature, neural networks are probabilistic models, and the approach to them should be completely different. Unfortunately, many new programmers of machine learning technologies in general and neural networks in particular, make system errors when working with them, get frustrated and give up on this matter. The idea of ​​writing this treatise on Habr arose after communicating with such frustrated users of neural networks - excellent, experienced, confident programmers.
')
Here is my list of rules and typical errors of using neural networks.
1. If it is possible not to use neural networks - do not use them.Neural networks allow solving the problem in case it is impossible to propose an algorithm by means of multiple (or very multiple) data viewing with eyes. For example, if there is a lot of data, they are non-linear, noisy and / or of a high dimensionality.
2. The complexity of neural networks must be adequate to the complexity of the task.Modern personal computers (for example, Core i5, 8 GB RAM) allow for comfortable time to train neural networks on samples of tens of thousands of examples, with the dimension of the input data up to a hundred. Large samples are a challenge for the deep neural networks mentioned above, which are trained on multiprocessor GPUs. These models are very interesting, but are out of focus of this article.
3. Training data should be representative.The training set should fully and comprehensively represent the described phenomenon, include various possible situations. It is good when there is a lot of data, but this in itself does not always help either. In narrow circles, there is a widespread anecdote, when a geologist comes to the discriminator, puts a piece of mineral in front of him and asks him to develop a system for recognizing such a substance. “Are there any more examples of data?” Asks the discriminator. “Of course!” The geologist answers, pulls out a pickaxe and splits his piece of mineral into several more pieces. As you understand, there will be no benefit from such an operation - such an increased sample does not carry any new information.
4. Shuffle the sample.After the input and output data vectors are collected, if the measurements are independent of each other, change the order of the vectors in an arbitrary way. This is critical for correctly separating the sample into Train / Test / Validation and all “sample-by-sample” training methods.
5. Normalize and center the data.For multilayer perceptrons, and for many other models, the values ​​of the input data must lie within the limits [-1; 1]. Before you submit them to the neural network, subtract the average from the data and divide all the values ​​by the maximum value.
6. Divide the sample into Train, Test and Validation.The basic error of newbies is to ensure the minimal error of the neural network operation on the training set, along with it, hellishly retraining it and then desiring the same good quality on new real data. This is especially easy to do if there is little data (or they are all “from one piece”). The result can be very frustrating: the neural network will adapt to the sample as much as possible and lose its performance on real data. In order to control the generalizing abilities of your model, divide all data into three samples at a ratio of 70: 20: 10. Train on Train, periodically checking the quality of the model on Test. For a final non-judgmental assessment - Validation.
The cross-validation technique, when Train and Test is repeatedly formed in turn in an arbitrary way from the same data, can be cunning and give a false impression of the good quality of the system — for example, if the data is taken from different sources and this is critical. Use the correct Validation!
7. Apply regularization.Regularization is a technique that allows you to avoid retraining a neural network during training, even if there is little data. If you find a tick with such a word, be sure to put it. A sign of a retrained neural network is large weights, of the order of hundreds and thousands, such a neural network will not work normally on new, not seen before, data
8. No need to retrain the neural network online.The idea of ​​training a neural network permanently all the time on new incoming data is in itself correct, in real biological systems, this is exactly what happens. We study every day and rarely go crazy. However, for conventional artificial neural networks at the present stage of technical development, this practice is risky: the network can retrain or adapt to the most recent data received - and lose its generalizing abilities. In order for the system to be used in practice, the neural network needs: 1) to train, 2) to test the quality on test and validation samples, 3) to choose a successful network variant, to fix its weights and 4) to use the trained neural network in practice, weights in the process use not to change.
9. Use new learning algorithms: Levenberg-Marquardt, BFGS, Conjugate Gradients, etc.I am deeply convinced that to realize learning by the method of back propagation of error (backpropagation) is the holy duty of everyone who works with neural networks. This method is the simplest, relatively easy to program and allows you to study well the learning process of neural networks. Meanwhile, backpropagation was invented in the early 70s and became popular in the mid 80s of the last century, since then more advanced methods have emerged that can at times improve the quality of education. Better use them.
10. Train neural networks in MATLAB and similar friendly environments.If you are not a scientist developing new methods of teaching neural networks, but a programmer-practitioner, I would not recommend encoding the procedure for teaching neural networks on your own. There are a large number of software packages, mainly in MATLAB and Python, which allow you to train neural networks, while monitoring the process of learning and testing, using convenient visualization and debugging tools. Use the heritage of humanity! I personally like the approach “learning in MATLAB with a good library - implementing a trained model with my hands”, it is quite powerful and flexible. An exception is the STATISTICA package, which contains advanced methods for teaching neural networks and allows them to be generated in the form of C program code, convenient for implementation.
In the next article, I plan to describe in detail the full industrial cycle of neural network preparation implemented on the basis of the principles described above, which is used for recognition tasks in a commercial software product.
Good luck!
Literature
[1] Hinton G., Deng L., Yu D., Dahl G., Mohamed A., Jaitly N., Senior A., ​​Vanhoucke V., Nguyen P., Sainath T. and Kingsbury B. Deep Neural Networks for Acoustic Modeling in Speech Recognition, IEEE Signal Processing Magazine, Vol. 29, No. 6, 2012, pp. 82 - 97.
[2] Ciresan D., Meier U., Masci J and Schmidhuber J. Multi-column for Traffic Sign Classification. Neural Networks, Vol. 34, August 2012, pp. 333 - 338
[3] S. Osovsky. Neural networks for information processing - Per. from polish. M .: Finance and Statistics, 2002. - 344s.
[4] Bishop CM Pattern Recognition and Machine Learning. Springer, 2006 - 738 p.
[5] S. Khaikin. Neural networks: full course. Williams, 2006.