In continuation of the theme of the
evolution of neural networks .

The production of explicit knowledge from accumulated data is a problem that is much older than computers. Learned neural networks can produce hidden knowledge from data: a skill of prediction, classification, pattern recognition, etc., is created, but its logical structure usually remains hidden from the user. The problem of manifestation (contrasting) of this hidden logical structure is solved by reducing the neural networks to a special “logically transparent” sparse form.
Each researcher who decides to use neural networks faces two questions: “How many neurons are needed to solve the problem?” And “What should be the structure of a neural network?” Combining these two questions, we get the third: “How to make the neural network workable for the user (logically transparent) and what benefits can such an understanding bring? ”
How many neurons do you need to use?
When answering this question, there are two opposing points of view. One of them claims that the more neurons to use, the more reliable the network will be. Proponents of this position cite an example of the human brain. Indeed, the more neurons, the greater the number of connections between them, and the more complex tasks the neural network can solve. In addition, if you use a deliberately larger number of neurons than is necessary to solve the problem, then the neural network will learn exactly. If we start with a small number of neurons, then the network may not be able to learn how to solve the problem, and the whole process will have to be repeated first with a large number of neurons. This point of view (the more - the better) is popular among the developers of neural network software. So, many of them, as one of the main advantages of their programs, call the possibility of using any number of neurons.
The second point of view is based on such a “empirical” rule: the more fitting parameters, the worse the approximation of the function in those areas where its values ​​were unknown in advance. From a mathematical point of view, the tasks of learning neural networks are reduced to the continuation of the function given in a finite number of points to the entire domain. With this approach, the network input data is considered the function arguments, and the network response is the function value. It is obvious that the approximation obtained with the help of the 3rd degree polynomial corresponds more to the internal representation of the “correct” approximation. Despite its simplicity, this example quite clearly demonstrates the essence.
The second approach determines the required number of neurons as the minimum required. The main disadvantage is that this, the minimum required number, is not known in advance, and the procedure for determining it by gradually increasing the number of neurons is very laborious. Based on the experience of the work of various groups in the field of medical diagnostics, space navigation and psychology, it can be noted that in all these tasks, more than a few dozen neurons have never been required. Summing up the analysis of two extreme positions, we can say the following: a network with a minimum number of neurons should better (“more correctly”, more smoothly) approximate a function, but finding out this minimum number of neurons requires large intellectual costs and experiments on training networks. If the number of neurons is redundant, then the result can be obtained from the first attempt, but there is a risk of building a “bad” approximation. The truth, as always happens in such cases, lies in the middle: you need to choose the number of neurons larger than necessary, but not by much. This can be done by doubling the number of neurons in the network after each failed learning attempt. However, there is a more reliable way to estimate the minimum number of neurons — using a contrasting procedure. In addition, the contrasting procedure allows you to answer the second question: what should be the structure of the network.
The contrasting procedure is based on the principle of excluding connections with minimal impact on the result until the network loses the ability to learn the task from scratch.
Logically transparent neural networks
One of the main disadvantages of neural networks, from the point of view of many users, is that the neural network solves the problem, but cannot tell how. In other words, the algorithm for solving the problem cannot be extracted from the trained neural network. However, a specially constructed contrasting procedure allows solving this problem.
For example, we require that all neurons have no more than three input signals. Let us define a neural network in which all input signals are sent to all neurons of the input layer, and all the neurons of each next layer receive the output signals of all the neurons of the previous layer. We will teach the network an error-free solution to the problem.
After that we will produce a contrast in several stages. At the first stage, we will only contrast the weights of the neuron connections of the input layer. If, after contrasting, some neurons have more than three input signals left, then we will increase the number of input neurons. Then we perform a similar procedure alternately for all other layers. After completion of the described procedure, a logically transparent network will be obtained. You can make additional network contrasting to get a minimal network. If by logically transparent networks we mean networks in which each neuron has no more than three inputs, then it is obvious that the minimality of the network does not entail logical transparency.
Conclusion
The technology of obtaining explicit knowledge from data using trained neural networks looks quite simple and does not seem to cause problems - you just need to implement and use it.
The first stage: we train the neural network to solve the basic problem. Usually the basic is the problem of recognition, prediction (as in the previous section), etc. In most cases, it can be interpreted as the task of filling the gaps in the data. Such spaces are the name of the image in recognition, and the class number, and the result of the forecast, and others.
The second stage: using the analysis of indicators of significance, contrasting and complementary training (all this is used, most often, repeatedly) we bring the neural network into a logically transparent form - so that the acquired skill can be “read”.
The result obtained is ambiguous - if you start from a different initial map, you can get another logically transparent structure. Each database has several options for explicit knowledge. It may be considered a lack of technology, but scientists believe that, on the contrary, technology, which provides the only option of explicit knowledge, is unreliable, and the non-uniqueness of the result is a fundamental property of producing explicit knowledge from data.
And for a snack photo Xiao Xiao Li - a charming employee of Microsoft Research, developing Bayesian anti-spam filters.

')
Bibliography
Neuroinformatics / A.N. Gorban, V.L. Dunin-Barkovsky, A.N.Kirdin, etc. - Novosibirsk: Science. Siberian Enterprise RAS, 1998.