Neural networks, genetic algorithms, etc. ... Myths and reality

In the continuation of the article “Comparison of technological approaches to solving data extraction problems”, we consider the technologies most frequently mentioned in connection with the concept of “artificial intelligence” in the context of search tasks. Many articles on this topic have been published on habrahabr.ru, for example, on the use of neural networks in the search for Yandex , which says that "In fact, the machine writes a ranking formula (it turned out about 300 megabytes)," on deep learning, on probabilistic programming, etc. .

I would like to consider this topic from the point of view of the philosophy of logic, define the boundaries and problems of applicability and speculate a little about the possibility of solving machine learning problems with the help of neural networks.

As a basis for our reasoning, we can choose any of the following technologies. Taking into account that neural networks are most often mentioned, we take them. By typing something about the neural network in the search box, we get a huge mass of articles about the "unimaginable" successes achieved by neural networks. These are messages about new hardware solutions, for example, and spin electronic devices , and IBM's statements that neural networks analyzing words can detect mental illness , and "superhero" vision, and many other wonders of science. Therefore, we will try to make a brief overview of the current state of affairs.

First of all, I would like to highlight a common pool of technologies, united under the general term “artificial intelligence systems”. To him with varying degrees of reliability can be attributed:
')
1. Neural networks
2. Genetic Algorithms
3. Methods of probabilistic programming, etc.

It is possible that it will seem to someone that completely different technologies are dumped here "in one pile". But this impression is deceptive, because despite the obvious differences, they all have common features:

1. Assume training system.
2. The basis of the knowledge base is a set of samples within the framework of the classification features.
3. It is assumed that there are redundant competing calculations until one of the streams reaches a given threshold of confidence.
4. The result of the calculation is usually any precedents from a predetermined list.

The very same learning is characterized by the following main features:
1. The presence of a priori knowledge, given in the form of classifying models
2. The presence of a sample base for building a “model of the world” according to classification features.

Let us, for a start, briefly describe the distinctive features of each of the above approaches.

Neural networks
According to Wikipedia, the link to it will be forgiven, neural networks "are a system of simple processors (artificial neurons) connected and interacting with each other". Handsomely. There are various options for implementation such as Bayesian networks [1], recurrent networks [2], and so on ... The main model of work: the base of images - transfer function-optimizer.

The most widely used today are limited Boltzmann machines in a multi-layered version. Layering, i.e. Depth is needed to overcome the “XOR” problem. In addition, as shown by Hinton, an increase in the number of layers of hidden neurons makes it possible to increase the accuracy due to the presence of “intermediate” images with minimal difference in each layer. In this case, the closer the intermediate layer to the exit, the higher the specification of the image.

The main purpose of creating neural networks and the resulting task of learning is to eliminate the need for intermediate computational conclusions when analyzing the profile-matrix of incoming signals. This goal is achieved by creating a base of reference profiles, each of which must have a single neuron at its output - a cell of the resulting matrix. Each such neuron is assigned a certain interpretation-result.

The main problem from the point of view of the problem being solved, as well as the actual training, is the noise of incoming matrices of the incoming neurons entering the analysis matrices. Therefore, one of the main conditions is the availability of high-quality training sample. If the training sample is of poor quality, then a high noise level will lead to a large number of errors. However, the large size of the training set can lead to the same result.

To some extent, the work of the neural network can be compared with the work of unconditioned reflexes of living beings [3] with all the attendant drawbacks.

This approach works well for tasks where there is no strong noise, a clear a priori base of reference images. The main task is to choose the most appropriate image from an existing knowledge base. The tasks of forecasting, in this case, will be solved only by extrapolating the existing history without the possibility of creating new entities, i.e. induction with insufficient deduction.

Some may argue that this is not the case. But before rejecting the above, it is necessary to determine what is meant by the term “new entity”? Is this another instance within the framework of the existing vector-class space of the selected subject area or the creation of new area-spaces? One of the following articles will be devoted to the topic of comparing something with something that can be compared and what cannot.

The basic principle of learning is induction. Induction is difficult at least because of the initial formulation of the problem - it will get rid of intermediate calculations. A possible objection to the fact that the processes associated with induction, can occur at the learning stage - are weak, since learning is based on two fundamental principles:

1. The presence of the very fact of the emergence of a new profile and its reflection (which is not noise) and the need for an outside expert to determine whether the profile corresponds to the result. [1]
2. The absence of a simple and reliable mathematical apparatus that clearly describes the conditions and rules for generating new dimensions, and, therefore, classes of objects.
3. By itself, the tracing procedure is a process of summarizing, searching for new routes and routes, albeit with the need to control the uniqueness of the correspondence of profiles and results.

Possible arguments about associative fields do not add anything new, as they are just an extension of the existing deductive approach.

Genetic algorithms
According to the same Wikipedia: “The genetic algorithm is a heuristic search algorithm that is used to solve optimization and modeling problems by randomly selecting, combining and varying the desired parameters using mechanisms similar to natural selection in nature.”

There is a mass of works by such authors as Panchenko T.V. [4]., Gladkova L.A., Kureichik V.V. [5], and so on ... The foundations of the “genetic approach” are well disclosed here .

There are many interesting works on the use of genetic algorithms. For example, the work of I.Yu. Pleshakova, S.I. Chuprina [6], V.K. Ivanov's article. and Mankina PI [7], articles on Habré and a number of others.

One of the most important advantages of genetic algorithms is the absence of the need for information about the behavior of a function and the insignificant influence of possible discontinuities on optimization processes. As in the case of neural networks, there is a departure from the need to analyze causal relationships, by building a “final” image - the objective function. In this sense, from the point of view of solving text analysis, searching, genetic tasks solve the same problems or are very similar to the methods of latent semantic analysis. At the same time, we must pay tribute, in terms of semantic search and text indexing, genetic algorithms have much greater prospects, compared with the methods of latent-semantic analysis.

From the point of view of pattern recognition, with a very strong stretch, the objective function can be compared with the layer of input neurons and with the expectation of a maximum as an analogue of maximizing the signal of the neuron of the output layer. Although it would be more correct to say that genetic algorithms are used to improve the efficiency of learning of neural networks, they still cannot be considered as a competition for neural networks. Tasks are different.

A common drawback - the absence of induction algorithms - is fully present.

Probabilistic programming methods
The inclusion of probabilistic programming methods in this article is rather a tribute to fashion than a necessity. By themselves, stochastic methods, which today are proudly called probabilistic programming methods, have been known for quite some time and, like neural networks, are experiencing another takeoff. Genetic algorithms are a good example of a stochastic approach. There are a lot of articles on the Internet, for example, “Probabilistic programming is the key to artificial intelligence?” . Therefore, it does not make sense to dwell in detail on the methods themselves and proceed directly to the conclusions.

The most accurate definition of what has been understood by probabilistic programming is found here : “a compact, composite way of presenting generating probabilistic models and conducting statistical inference in them, taking into account data using generalized algorithms”. It is not something fundamentally new, but an interesting addition to machine learning methods.

Thus, today the term “AI” rather means a subspecies of technological (algorithmic) approaches to solving combinatorial problems. The main tasks of which are the reliable separation of “statically significant” inherently laws and the construction of images-objects on the basis of statistics, without analyzing cause-effect relationships. The main directions are pattern recognition. Under the images can be understood images, sounds, a combination of symptoms of disease, etc.

The result of learning a neural network or the work of a genetic algorithm should be some identified pattern, presented in the form of a certain matrix (vector). Of course, this matrix or set can be constantly corrected by new examples, but this does not affect the essence of what is happening. In any case, the set revealed and cleared of noise can be represented in the form of “alienable logic”, which is a kind of “optimal” way to solve the problem. An example of such an area is the task of automatically categorizing texts, but not from the point of view of posting texts by already known headings, but the creation of headings itself. Their annotation, as well as the automatic construction of various kinds of ontologies.

findings
The birth of modern mathematics, of course, is a long process that has been protracted for centuries. But, observing modern trends, a disappointing conclusion suggests itself: everything moves in a circle. The philosophers of ancient Greece did not know mathematics and mathematical formulas, operated with concepts at the level of images and "everyday" concepts. This was not enough for the organization of more complex and, most importantly, abstract reasoning. In addition, one of the main tasks of mathematics is the search for logics, which allows to significantly reduce the cost of calculations by deriving compact and optimal laws. All this was the impetus for the creation of today's mathematics with its modern notations. The beginning is seen not earlier than the XVI century by a number of scientists such as Descartes, Leibniz, etc.

Modern reasoning and the logic of what is called "artificial intelligence" today follow the same path. And the state of today "leads" back to basics, since it is more based on the same principles of searching for "common" patterns rather in the style of Pythagoras and Euclid. The use of AI logic is limited to areas that from a human point of view could be called areas of unconditional reactions. A tiger must be recognized without any analysis, immediately and quickly, before it eats the subject of observation. These include most of the tasks of recognizing any images, diagnosing diseases.

These algorithms are still not clear how, in view of their basic primitivity, to apply for solving problems that require the generation of any new logic. In other words, the creation of systems that can actually solve the problem of resolving causal relationships or generate new hypotheses and measurements.

The birth of mathematics, capable of induction is still ahead, and the explosive growth of interest in AI is mainly due to the growth of computational capabilities, and not the emergence of new algorithms. But as a result of this growth, a point was nevertheless reached, after which the solution of a large amount of tasks, both in terms of application and initial data, but relatively small in terms of analytical complexity, became economically viable. But this is still an extensive way of development.

All of the above is not an affirmation of the futility of neural networks or similar technologies. The scope of tasks and their value are enormous. This and the help in recognition of images and the help to experts in different areas at the analysis of data and details, insignificant at first sight. A good example of such an application is to assist the AI in making diagnoses.

As they say, to be continued. In the following articles, it is supposed to reflect from the point of view of “programming” about basic concepts, without which the birth of artificial intelligence, which has thinking abilities, is impossible. These include concepts like: ontologies, objects and their properties, quality, completeness. What can be compared and what can not. And what is generally necessary for this. And also think about what is necessary for the machine to have the ability to induce ...

[1] The Bayesian network (or the Bayesian network, the Bayesian network of trust, English Bayesian network, belief network) is a graphical probabilistic model, which is a set of variables and their probabilistic Bayesian dependencies. For example, the Bayesian network can be used to calculate the likelihood of a patient being sick in the presence or absence of a number of symptoms, based on evidence of a relationship between symptoms and disease. The mathematical apparatus of Bayesian networks was created by the American scientist Jude Pearl, winner of the Turing Award (2011).

[2] Recurrent neural networks (eng. Recurrent neural network; RNN) is a type of neural networks with feedback. In this case, feedback implies a link from a logically more remote element to a less remote one. The presence of feedback allows you to memorize and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming in such networks, an analog of cyclic execution appears, and from the point of view of systems, such a network is equivalent to a finite state machine. Such features potentially provide many opportunities for modeling biological neural networks [source?]. However, most of the opportunities at the moment are poorly understood due to the possibility of building a variety of architectures and the complexity of their analysis.

[3] A. Barsky “Logical neural networks” M .: Internet-University of Information Technologies, 2007

[4] Panchenko, T. V. Genetic Algorithms [Text]: study guide / ed. Yu. Yu. Tarasevich. - Astrakhan: Publishing house "Astrakhan University", 2007.

[5] Gladkov L. A., Kureychik V. V., Kureichik V. M. Genetic Algorithms: Study Guide. - 2nd ed. - M: Fizmatlit, 2006. - p. 320. - ISBN 5-9221-0510-8.

[6] I.Yu. Pleshakova, S.I. Chuprina "Genetic algorithm for improving the quality of semantic search in the texts of scientific publications" cyberleninka.ru/article/n/geneticheskiy-algoritm-dlya-uluchsheniya-kachestva-semanticheskogo-poiska-po-tekstam-nauchnyh-publikatsiy

[7] Ivanov V.K. and Mankin P.I. "Implementing a genetic algorithm for effective documentary thematic search" Tver-state and technical

Source: https://habr.com/ru/post/321140/

All Articles

Neural networks, genetic algorithms, etc. ... Myths and reality

More articles: