Cognitive bias of universal intelligence

Introduction

In previous articles ( http://habrahabr.ru/post/150056/ and http://habrahabr.ru/post/150902/ ) we have considered the simplest models of ideal minimal intelligence (IMI), in particular, the AIξ model. With not very significant reservations, one can almost agree that "AIXI model is the most intelligent unbiased agent possible" [Hutter, 2007] and that THEM in its behavior will be no more limited than a person, but with sufficient computing resources and information . The last disclaimer explains the main reasons why these models did not lead to the creation of a real AI and why they can only be considered as the first small step towards it. It is important to determine where to go next.

As we noted in previous articles, the real, “pragmatic” (that is, optimized for our world), and not “unbiased” (Pareto optimal for the whole class of computable media) intelligence, described by IMI models, is of interest. A pragmatic AI cannot be built only on the basis of a purely analytical consideration of the problem of choosing optimal actions in an arbitrary environment. Even the most effective self-optimization will be insufficient, since it will require not only a huge amount of computations comparable to evolution, but also a corresponding number of physical interactions of the agent with the environment. The pragmatism of a universal AI should be provided by the introduction of a “cognitive bias”, elements of which are in some form revealed in classical studies of AI and human thinking, all the results of which have empirical utility. The information accumulated here cannot be rejected, but their special interpretation is necessary within the framework of the theory of universal intelligence. We show that the introduction of cognitive functions does not expand the fundamental possibilities of THEM, but should increase its effectiveness / pragmatism.

Perception.

IMI models do not include such a distinguished cognitive function as perception, if by perception we mean not just receiving sensory data as inputs, but precisely those specific cognitive processes that are characteristic of natural systems. At the same time, natural systems of perception have a pronounced structure that sets a large inductive displacement consistent with regularities encountered in the real world. This bias is realized in the form of representations of information and makes it possible to very efficiently interpret sensory data without an exhaustive search.

Using the example of perception, it should be absolutely clear that models of IMI, which require a direct search for algorithmic models, for example, for images that are longer than millions of bits (ie the number of models under consideration >> 10 ¹⁰⁰⁰⁰⁰ ), are absolutely unrealistic. At the same time, it should be emphasized that the human sensory perception system is universal. It can detect a stimulus given by almost arbitrary regularity. At the same time, it is very easy for any computer vision system to find a class of unidentifiable stimuli. This is well illustrated by the example of attempts to model the formation of a conditioned reflex with nontrivial stimuli (see, for example, [Potapov and Rozhkov, 2012] and references therein). It is in this context that the view is expressed that, despite significant progress in the field of robotics, artificial intelligence, machine perception and learning, there is a lack of truly cognitive systems that have sufficient generality to work in an unstructured environment [Pavel et al., 2007], which is explicitly connected with the non-universality (in the sense of the algorithmic completeness of the model space) of perceptual systems.
')
In IMI, the process of building models includes perception implicitly: in it, as a specific cognitive function, it is not separated from more complex symbolic models of the world. Such a division itself (but the division is “soft”!) Can be considered as a heuristic, but it alone, naturally, is not enough, and a general question arises how to make the model building process in THEM more efficient.

Models.

THEM does not imply the selection and fixation of models of the environment. For optimal prediction, all the available data are sorted by all possible models with different weights, which are taken into account in the prediction. Naturally, for us it looks absolutely wasteful. Sometimes it may even seem that a person is inclined to look for some one true model of the world. In particular, all science is an attempt to build some kind of unified model of the world corresponding to unambiguous laws. Of course, in the process of scientific research, various competing theories are sorted out, but in the end a choice is made between them. At the same time, the multiplicity of theories is largely supported not within a single intellect, but due to the multi-agent of society.

In the case of perception, there is also a pronounced tendency to choose a single model or interpretation. This is especially clearly seen on dualistic illusions, when human vision chooses one interpretation from two equivalent ones. At the same time, a person can consciously cause the vision to switch to another interpretation, but cannot see both options at the same time. This happens both at fairly low levels, say, of a structural description, and at a semantic level. There are many well-known dual illusions, which once again make no sense.

However, it must be especially emphasized here that the choice of the only consistent model for prediction when choosing actions is not only unnecessary, but even harmful. Therefore, a real AI, generally speaking, is not obliged to try to build a single (and unified) environmental model, and also explicitly. Attempts to create AI with the support of global true models (a large base of consistent axioms) encountered considerable difficulties. Here there are difficulties with the systems of maintaining truth when receiving new information, the implementation of inductive behavior becomes problematic, etc.

A person can quite naturally be guided by conflicting not only data, but also models, in some cases using some models, in others - others. It can even be said that there are no contradictory data “in nature” (at least there are no such data for THEM); they appear as a heuristic characteristic when simplifying universal prediction models. Also, a person can completely discern something or not to hear and put forward different assumptions about what was there. That is, the theoretically ideal consideration of all possible models is replaced by a “smart” analysis of the results of sorting through models.

The selection and incremental refinement of models in real AI will naturally be necessary. Each time, perform induction on all available data and sort through all possible models is extremely wasteful in conditions of limited resources. But, at the same time, the input of resource constraints should not be so rigid that the only “true” model remains.

In particular, thoughtless simplification in the search for models and prediction leads to the loss of important forms of behavior (for example, inductive behavior aimed at finding information). Also in science, the choice between theories is made not simply on the basis of current data, but different theories are considered simultaneously in order to determine which experiments will provide new information to reduce the existing uncertainty. It is thanks to this inductive behavior (possible when considering many models) that such information is accumulated, which so greatly increases the difference in the quality of models, that the choice becomes almost unambiguous. Similar in content (and quality) models are not processed as independent different models, but as one “fuzzy” model. Introducing such indefinite / fuzzy models may allow one to obtain effects of inductive behavior even when choosing a single model: indeed, some actions will lead to a greater reduction of uncertainty, which will allow us to more or less receive some kind of reinforcement when performing the following actions. Naturally, there remains the question that requires theoretical consideration, how to make the replacement of the sets of models for “fuzzy” models in the most efficient way.

So, the models appear as “caching” the results of induction obtained at previous points in time, and as a restriction to iterate over the entire infinite set of models in universal induction, but these models should be introduced “gently” so as not to lose universality.

Representation.

Limiting the number of models under consideration is necessary, but far from enough. Even using one better model (that is, using Kolmogorov complexity instead of algorithmic probability) turns out to be unrealistically expensive: the search time for a model of length L will be proportional to 2 ^L. The complexity of the model depends on the choice of the reference machine (programming method). But even with a successful choice of the reference machine, the length of some models (describing real laws) will be too large for these models to be built by direct search. Here we need additional metaheuristics.

Within the framework of the problem of visual perception, we have already attempted to bring together universal induction and real methods of image analysis [Potapov, 2012]. We are talking about the principle of the representational minimum length of the description, which arises from the need to decompose the task of building a model of the complete history of the interaction of the agent with the environment into subtasks, which are almost independent. If some long line of sensory data is divided into substrings, then the complexity of the substrings will be much greater than the algorithmic complexity of the story as a whole. In this regard, such direct decomposition is unacceptable. However, if mutual information is extracted from these substrings, which is used as a priori when describing each substring separately, the total conditional algorithmic complexity of the substrings will be much closer to the complexity of the history. This mutual information can be interpreted (or be expressed in the form) as a representation (method of description). The introduction of representations is similar to the choice of a reference machine, but differs in two aspects: the concept of representations includes an additional metaheuristic of decomposition, and different representations can be used for different data fragments, which may not necessarily correspond to algorithmically complete model spaces (while the reference engine specifies a uniform in the space of models).

Indeed, if we take, for example, vision, then the description of images (both in natural systems and in applied automatic methods) is always carried out within the framework of a certain a priori representation, the purpose of which is not only to shift the probability distribution in the space of environmental models but make it possible to decompose them. In particular, through the use of a priori representations of images, computer vision methods are applicable to each image separately, rather than requiring a large number of different images to be input, for a combination of which complex patterns found in each of the images can be displayed.

The concept of presentation is very productive. The idea of representations applies also to representations of sensory data, and representations of knowledge, and mental representations (that is, representations can be called a common cognitive feature of natural intelligence). Even the general idea of hierarchical descriptions, which has an independent value, should be regarded as an idea of an important but private type of representations. Hierarchical decomposition is naturally potentially more efficient. Representations of this type are quite common in machine perception. However, intensive hierarchical decomposition leads to a decrease in the quality of models built within the framework of the corresponding concepts. Compensation of this negative effect can be made by introducing adaptive resonance. Again, in some approaches to strong AI, the value of adaptive resonance is absolutized (it is believed that it is the key to FIC). Although the value of the mechanism of adaptive resonance is certainly great, it is necessary to understand that it is only one of the metaheuristics that can be formalized within the framework of the theory of universal induction.

It is worth noting the inadequacy of innate ideas, even in the case of sensory perception. There is a lot of evidence in favor of the fact that, both in humans and in many other animals, there is an adaptation of ideas (even very low levels of perception) to a specific environment. And for AI, there is a need for automatic construction of representations, which can also be explored in the theory of universal induction (Potapov et al., 2010) and which probably should be an element of self-optimization of an effective universal AI, since teaching presentations is a more specific and “pragmatic” way incremental refinement of the reference machine that defines an a priori probability distribution in the model space. But it should again emphasize the need to preserve the universality (algorithmic completeness) of the set of representations.

Planning.

Above, we talked about incremental model building as a way to reduce enumeration while gradually lengthening the history. However, the problem of choosing optimal actions also has a high computational complexity, and for this problem it is quite natural to introduce incremental solution schemes. Such schemes lead to the concept of planning, which is also one of the cognitive characteristics of a person.

It can be noted that not only IMI, but also weak AI methods based on “brute force” do not use planning. In particular, this refers to successful chess programs, in which they are strikingly different from human players who almost certainly rely on plans, following the principle of “a bad plan is better than none” [Bushinsky, 2009]. Planning, which includes reusing search results from previous time ticks, saves resources. Indeed, plans are made in advance (and when circumstances allow, that is, if there are free computing resources), and they are only refined in the process of execution, which means that there is no need to rebuild the entire search tree from scratch at any given time. Such a strategy, of course, can be included in THEM, but its good implementation may be nontrivial. In this sense, the chess program is ineffectively intellectual in contrast to man. However, the fact that for a narrow class of environments such as playing chess is easier to do inefficient AI, does not mean the presence of the same opportunity for universal intelligence.

Planning is closely related to other search optimization methods. So, people make plans and search in terms of certain generalized actions. The more distant the plans, the more abstract the terms they are described. The use of generalized actions is obviously heuristic. These actions are also described within the framework of some ideas, but they are not directly derived in the theory of universal induction. In practice, in weak AI methods such representations are set a priori, and specific scheduling algorithms are developed for them. This is clearly not enough for a universal AI.

In addition to planning itself as an incremental search and representations for the search space, there are many heuristic methods for reducing it. At the same time, on the one hand, search and optimization methods, such as heuristic programming, annealing imitation, genetic algorithms, etc. very elaborated in the classic AI. On the other hand, there is currently no general solution to the search problem. It is very likely that there cannot be any single a priori effective search method, and the need for some self-optimization strategies is inevitable, since different heuristics and specific search methods are better suited for different tasks.

At present, there is no theory of effective pragmatic general self-optimization capable of inventing arbitrary search heuristics. However, if the method of such self-optimization existed, it would require some general metaheuristics for its acceleration (otherwise it would not be pragmatic).

In general, it is clear that planning, like other methods of reducing brute force, is “only” an element of optimization of computing resources. It can be entered and not as a heuristic - with preservation of exact correspondence with THEM, but in this form it will not be too effective. More heuristic planning implementation methods will not work for all possible environments, but they can be very effective for a specific, but very wide class of environments. So there are such concepts (which are essentially heuristic in the sense that for certain classes of environments they are meaningless), as the suspension and resumption of the implementation of the plan. At the same time, it remains an open question what kind of planning mechanisms (and various search metaheuristics) should be made congenital, and which intellectual agent will have a chance to learn in the foreseeable time.

Knowledge.

Knowledge plays a special role in human intelligence. At the same time, knowledge is not explicitly used in THEM. Instead, holistic models of the history of interaction with the environment are constructed in IMI without explicitly extracting knowledge from them. In principle, knowledge is often viewed simply as the upper level of hierarchical models of perception and control (for example, as the upper level of the visual system). In this context, little can be added to what has already been discussed in the sections on perception, planning and presentation. However, knowledge systems have their own characteristics. In particular, only knowledge representations (rather than lower-level representations) are modally non-specific and describe “meaning”, and knowledge is used not only to describe internal models of the environment, but also is used to transfer between different agents (social interactions constitute a separate block of cognitive bias). , what we say below).

In general, knowledge representations can probably be found in the process of self-optimization by THEM, but this process requires extremely long interaction with the environment. Useful representations of the environment, abstracted from specific modalities, can further accelerate the expansion of THEM to a pragmatic effective FIC. But, again, these views should not limit the universality of THEM, as it has to be in almost all existing cognitive architectures and more private knowledge-based systems.

Memory.

We can say that there is a memory in THEM, but the most primitive one. IMI simply stores all raw data without performing any other function. At the same time, memory is one of the central elements of most cognitive architectures. Also, the memory of a person is much more complex, and its functions are far from just storage. As you know, the main function of a person’s memory, which constitutes the main problem to be played on a computer, is the extraction of content information. Let's say we can remember some event, place, object, person according to their verbal description, image fragment, pencil sketch, etc.

There is nothing like them in THEM. Does this mean that the universal agent will not be able to show the behavior that is available to us thanks to our memory? Not at all. Memory, we need, first of all, to predict. We remember the past in order to predict the future (or, at least, make better choices in the future). It is difficult to come up with some other biological meaning of memory. In fact, natural memory is so closely integrated with the functions of induction and prediction that it is practically nonseparable in its pure form. A special memory device is determined to do this computationally most efficiently (taking into account the peculiarities of our world). Justify the second thesis separately from the first. If our memory would simply store raw data (for example, as one long movie), then in order to find scenes in this “movie” that meet certain search criteria, you would have to watch the entire movie again, having processed each scene. What is the point of doing this if the “movie” has already been seen and interpreted once? Naturally, it is more economical to memorize its already constructed descriptions and to search immediately among them.

With unlimited resources, such efficiency is not needed, and at each time point they simply re-processes the entire interaction history. It has already been noted (Goertzel, 2010) that the lack of memory as a cognitive structure in THEM is associated with the assumption of unlimited resources. But as soon as we want to increase the realism of our universal agent, taking into account limited resources, we will have to complicate the memory structure and integrate it with the procedures of model building, prediction and choice of actions.

In addition to the special functions of natural memory, it has a certain organization (episodic / semantic; short-term / long-term, etc.). In part, this organization follows from the other aspects considered. For example, in IMI, the model reproduces the entire history of interaction. It simultaneously describes episodic and semantic content. As soon as representations are introduced that do not reproduce specific data, but define “terms” in which these data are described, the corresponding separation of memory types appears. It is necessary to consider the dynamics of the deployment of views in time to understand the many features of the memory device.

There are other features of the organization of memory that can give additional elements of cognitive bias or heuristics of the search for models and actions. For example, the obvious heuristic is the presence of modal-specific memory. This implies the banal (but not unimportant in the context of IMI) conclusion that, in order to simplify induction (the process of building models), data of different modalities are interpreted relatively independently. This separation seems too natural and self-evident, but again we emphasize that it is essentially heuristic and far from complete.

Let's give one more very indicative feature of the device of memory of the person. These are chunks, which are even taken as the basis of some cognitive architectures [Gobet and Lane, 2010]. Probably, they are associated with the limit decomposition of models in memory (that is, the division of the entire memorized set of objects into minimal groups united by individual models). It is possible that chunks are only an epiphenomenon of the process of decomposition of the problem of induction, but they clearly show how strongly real intelligence tries to minimize costs when solving it.

Thus, the characteristics of human memory are an important source of elements of “cognitive bias,” but a correct understanding of these features also requires a detailed analysis within the framework of universal intelligence.

Character and sub-character levels.

In the methodology of classical AI, there is a fairly rigid division into sub-symbolic (for example, neural network) and symbolic (for example, logical) methods. It manifests itself in the division of cognitive architectures into emergent and symbolic. Now there is a tendency to unite both approaches, in particular, in the form of developing hybrid architectures. But the very fact of such a division is remarkable. After all, they are not in THEM. Does the person have it, that is, is the obvious division into symbolic and subcharacter levels a feature of natural cognitive architecture?

It is quite obvious that the selection of two such different levels is related to the fact that the upper level is accessible through consciousness, and the lower level through neurophysiological studies (the results of which can be directly related to lower levels of sensory-motor representations). Intermediate levels are simply not available to direct observation, therefore, much worse studied. In this regard, in AI, the “middle layer problem” (or the “semantic chasm” problem) is sometimes singled out as one of the most difficult. However, the presence of intermediate levels of organization, although somewhat smooths out the acuteness of the symbolic / subcharacter dichotomy, does not negate the fact that there are separable levels of organization in natural intelligence.

Such a clear division is unlikely to arise by itself unless it is laid architecturally. In particular, it could hardly be distinguished in the models formed by IMI (in these models, even if there are some concepts of different levels, they will be hopelessly mixed). In natural intelligence, not only the constructed models of the environment have a rather pronounced multi-level structure, but also the methods of work at different levels are noticeably different (attention is drawn to at least the fact that consciousness is tied primarily to models of the upper levels).Thus, at the subsymbol level, typical patterns are mainly taken into account in a large array of sensory data, while the symbol level works with arbitrary patterns, but in strongly reduced data. However, this is only a general description of the levels. When they are introduced, universality should be preserved in the form of direct and inverse connections between levels, as well as the possibility of constructing any computational predicates (basic perceptual concepts) at the sub-symbolic (and intermediate character) level. An example is the gestalt laws (the laws of perceptual grouping), which are typical for all people, but still cultural and primitive people may differ (which may manifest, for example, in (in) susceptibility to some optical illusions). In other words,the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be brought into appropriate representations depending on the characteristics of ontogenesis.

All this can be interpreted as a general a priori structure of representations and heuristics for constructing models within their framework, which (in addition to the concept of representations) provide significant savings in computing resources (but without a fatal violation of universality).

Associations.

There are many cognitive features of human thinking that are somehow associated with association. This is a multifaceted phenomenon, since the association can be performed for both representations and models, and at all levels of abstraction. But in all cases there is something in common.

Obviously, the decomposition of the real problems of both induction and the choice of actions is always incomplete. Processes related to associating can be interpreted as processes establishing possible connections between data elements, models, representations, which, as a result of decomposition of a single task facing universal intelligence, were considered initially independent.

The most obvious such interpretation is for the case of decomposition of the induction problem. An association is established between two models of fragments of sensory data, if there is mutual information between them, which can be expressed in statistical terms (frequent mutual appearance) or in structural terms (the presence of a simple algorithm that translates one model into another). The latter is also the basis of analogies and metaphors.

An example of the most difficult association is transfer learning, in which representations from one subject area are transferred to other subject areas. The fact that such a principle is possible and useful is evidence of the special properties of our world, on the use of which transfer training is based. Although the very existence of residual links, mutual information between different subject areas is not a special feature of our environment (rather, it would be surprising if there were no such interrelationships at all), but the ability to find and use these links in conditions of limited resources is indicative. First, it clearly demonstrates the universality of the human intellect, the absence of rigid restrictions on the structure of representations under construction,to establish links between the fragments of reality, and, secondly, it is also an element of cognitive bias.

It is difficult to say whether the mechanisms of transfer training, the establishment of associations, analogies, metaphors are essentially different, or they are different applications of one mechanism. But all these mechanisms (as well as the adaptive resonance mentioned above) can be viewed as both ways of reducing resource requirements and ways of eliminating the negative effect of this reduction depending on whether we are moving from pragmatic effective intelligence towards universality or universal intelligence in side of efficiency / pragmatism. Now transfer training is considered separately from the problems of universal AI, and it is not surprising that modern models of transfer training are excessively specialized:in them, the mapping between two views (between which knowledge transfer is carried out) is always set manually and works only for them. The apparent universality of transfer training from a person suggests that it should fit very closely to the core of a universal AI.

Again, there is no transfer training separately in the IMI, and it is not needed there (but only because of unrealistic unlimited resources): any mutual information between any data fragments is taken into account, and the transfer at the search heuristics level is not needed due to the lack of such. Since the transfer is carried out at the level of representations, in theory it should appear along with them and allow for a smoother transition from universal AI to the use of representations in real AI.

Transfer training [Senator, 2011] is an example of the most developed association. No less remarkable (its extreme prevalence) and the lowest-level association. At the behavioral level, these are conditioned reflexes (and at an even lower neural network level, this is Hebb's rule). Of course, associating itself is often understood as something more complicated than just conditioned reflexes (for example, according to VF Turchin, association is a system for managing complex reflexes, that is, a metasystem in relation to the most developed reflexes). However, they have the same basis.

Associating is often regarded as an independent (sometimes fundamental) principle of natural thinking, while contrasting induction (allegedly inductive models necessarily build, while association is generally modeless and not associated with any directional optimization) ). Of course, behind association there is a very effective metaheuristics that reflect a regularly occurring feature of our world (which, roughly speaking, comes down to the fact that the closer events are in time and space, the more likely they are related; but, of course, developed association is not limited to this) . Naturally, heuristics (including association) are not derived from the theory of unbiased universal intelligence and in this sense can be considered as additional principles.

However, associating cannot be considered either the sole or the main basis of thinking. This is clearly seen in the example of Hebb's rule, and in the example of reflexes. Thus, the Hebbian rule itself is not sufficient for solving complex learning problems associated with the construction of invariants. In the case of reflexes, the main difficulty is not the strengthening of the connection between two known stimuli, but the selection of classes of associated stimuli, which can be described by arbitrary patterns (the stimulus can be the inclusion of just a light bulb, light bulbs of a certain brightness or color, double light switching on first etc.). It is noteworthy that different animals have different abilities to identify patterns in incentives. So, the chickens are not able to learn how to choose a lighter trough (from several troughs,in one of which lies an invisible feed until the moment of selection). And even for monkeys, it is difficult to ignore the local context (for example, to use objects that are not in the field of view at the moment) to reach the fetus. Human intelligence is universal, and this universality is not explained by association, but combined with it.

.

Reasoning is something that is often considered proper thinking. Are there any reasoning in AIξ? In a sense, there is. Some of our reasoning comes down to where this or that action will lead us (namely, all the resources of AIξ are spent on determining this). Say, thinking about the upcoming conversation, we can assume that we will be told, and that it will be possible to answer this, as well as what emotions we will experience. However, our reasoning is not always directly related to the prediction of what kind of sensory input and what kind of reinforcement we will receive when performing certain actions. Often we think of things that are not directly related to us. And often in our arguments (which are introspectively available to us) there is no hint of induction. Indeed, deduction is more often associated with thinking.It is not by chance that in many expert systems, reasoning is modeled using inference mechanisms. There are no mechanisms of logical conclusion in THEM. Although in some models of type AIξ^tl or Gödel machines introduce logic to substantiate assertions about algorithms, but this has almost nothing to do with ordinary reasoning. Does this not indicate that something fundamentally lacking in THEM?

In fact, it is fairly obvious that it does not fully testify. In the methods of deductive inference, a search is made for the variants of admissible chains of rules of inference, until the proved assertion or its refutation is obtained. Such a search is similar to the search performed in the Institute for a single fixed environmental model. The clear difference is that in the IMI, it is performed at each time point for holistic models of environments that are also being redrawn taking into account the information just received. And in effective pragmatic systems it is simply impossible, and we have to consider models of only fragments of the environment, and even consider these models as fixed. The analyzed fragment of the medium may not be directly connected with us, and we can consider with respect to it those actions that we do not do ourselves (for example, we can think aboutwhat happens to the planet if a supernova explodes around it). The tendency to analyze the extremely indirectly connected with us fragments of reality (and even create imaginary worlds) is very curious, but requires separate discussion and relates, rather, to questions of motivation (the objective function). It is difficult to imagine that IMI to choose their own actions in order to maximize the objective function will (albeit virtually) indulge in abstract reflections on the structure of the Universe (or rather, seek to obtain the necessary information for this), but there is no contradiction here, especially if creating good cosmological theories he will receive reinforcements.but it requires a separate discussion and rather refers to questions of motivation (the objective function). It is difficult to imagine that IMI to choose their own actions in order to maximize the objective function will (albeit virtually) indulge in abstract reflections on the structure of the Universe (or rather, seek to obtain the necessary information for this), but there is no contradiction here, especially if creating good cosmological theories he will receive reinforcements.but it requires a separate discussion and rather refers to questions of motivation (the objective function). It is difficult to imagine that IMI to choose their own actions in order to maximize the objective function will (albeit virtually) indulge in abstract reflections on the structure of the Universe (or rather, seek to obtain the necessary information for this), but there is no contradiction here, especially if creating good cosmological theories he will receive reinforcements.if for creating good cosmological theories he will receive reinforcements.if for creating good cosmological theories he will receive reinforcements.

Now it is important for us that the deductive analysis of models of fragments of the environment is associated with the saving of resources. The results of the calculations of the algorithmic model with different sequences of effects can be stored and reused when the model remains unchanged. Naturally, ways of saving such resources are closely related to the issues of representations (and declarative representations, which may be related to the extension of the concept of computability) and can be extremely nontrivial. And, of course, they are not derived from THEM. So, logic can be interpreted as a meta-presentation useful for performing analysis of fragments of our world specifically, since the possibility of separating objects and relationships is a common (but weak) property that could well be irrelevant to some other reality where our logic would be useless. .Here it is necessary to do a great job of identifying the principles for implementing effective reasoning (which are also far from complete enumeration of elementary actions, as image processing methods are far from universal induction based on algorithmic probability). At the same time, “caching” the results of the analysis of fixed models will cause additional questions related to updating these results when new information is received (in particular, this is a well-known problem of “world closure” or nonmonotonic reasoning), which also requires solving.At the same time, “caching” the results of the analysis of fixed models will cause additional questions related to updating these results when new information is received (in particular, this is a well-known problem of “world closure” or nonmonotonic reasoning), which also requires solving.At the same time, “caching” the results of the analysis of fixed models will cause additional questions related to updating these results when new information is received (in particular, this is a well-known problem of “world closure” or nonmonotonic reasoning), which also requires solving.

Here again it would be possible to think that the IMI does not contribute to the solution, even without it, of known problems (for example, logical inference and the maintenance of truth). However, we note once again that IMI poses these known problems in a much more general form. Thus, the predicate logic within the framework of universal intelligence is represented only as a meta-representation, which has a heuristic nature and does not have to be set a priori: self-optimizing universal intelligence (for example, human) can learn the very logic and its effective use; our task is to create such intelligence and reduce its training time to an acceptable one. It may be easier to do this than manually creating many private methods, just as it is easier, say, to implement a certain teaching method, than to lay all the necessary particular facts manually.At the same time, when creating AI, one should try to achieve smaller cognitive distortions than in humans. So, although an a priori preference in dividing the perceived world into objects with properties and relations, may significantly speed up learning, but this preference should not be too rigid.

Social interactions.

Interaction with other intelligent agents is a very significant part of the environment. These agents are very complex, so the inductive reconstruction of suitable models of other agents will require a very long interaction in the real world and a huge amount of computing resources. Naturally, some theory of mind (the ability to model the mind, in particular, other agents) must be built into an effective pragmatic AI. But in the universal AI it should be added as an element of cognitive bias, which sets the displacement of models, but does not impose on them insurmountable restrictions.

Social interactions are not limited to predicting the behavior (or reconstruction of models) of other agents as part of the environment. Naturally, social agents interact with each other in the same way as with other environments, through sensory and motor skills. But through them, they can transfer to each other fragments of environmental models, behavioral strategies, and even elements of objective functions. In fact, it is society that forms complex objective functions, inductive bias and search heuristics (in the form of ethics, science, art, etc.), thanks to the exchange of information and computational resources between agents. An unbiased universal agent can learn in sufficient time (if during this time someone will ensure its survival) correctly interpret sensory data,identifying this information from them (although some innate mechanisms will be required for learning the objective functions). But effective pragmatic intelligence should have this ability a priori, that is, have an inductive preference for social media [Dowe et al., 2011] or have “communication priors” [Goertzel, 2009]. Of course, the more highly developed an animal is, the less young its babies are prepared for, and the universal AI can be forgiven for long-term “postnatal” helplessness, but still such a priori skills as the selection of other agents in the sensory flow and imitation can reduce the period of complete helplessness.that is, to have an inductive preference for social media [Dowe et al., 2011] or to have “communication priors” (Goertzel, 2009). Of course, the more highly developed an animal is, the less young its babies are prepared for, and the universal AI can be forgiven for long-term “postnatal” helplessness, but still such a priori skills as the selection of other agents in the sensory flow and imitation can reduce the period of complete helplessness.that is, to have an inductive preference for social media [Dowe et al., 2011] or to have “communication priors” (Goertzel, 2009). Of course, the more highly developed an animal is, the less young its babies are prepared for, and the universal AI can be forgiven for long-term “postnatal” helplessness, but still such a priori skills as the selection of other agents in the sensory flow and imitation can reduce the period of complete helplessness.as the selection in the sensory flow of images of other agents and imitation, can very significantly reduce the period of complete helplessness.as the selection in the sensory flow of images of other agents and imitation, can very significantly reduce the period of complete helplessness.

The essential (but not the only) aspect of social interactions is language. The analysis of language in the context of universal agents is still little studied. For example, the importance of coding was discussed in two parts (within the framework of the minimum message length principle), which allows agents to efficiently exchange regular parts of models separated from noise [Dowe et al., 2011]. But the main part of the important issues still require detailed analysis. This includes the semantic justification of symbols, and the clear problem that for universal agents it will be most effective (at least first) to learn the knowledge accumulated by humanity, for which you need to understand the natural languages that are associated with certain ways of representing knowledge, and ,the formation of these ideas should not require excessive effort from the AI.

One additional important aspect of multi-agent interactions is that the environment is much more complex and computationally powerful than the agent itself. This aspect is not heuristic or inductive bias, but it also needs to be taken into account in THEM models.

Emotions.

Emotions are often viewed as a component of a cognitive architecture, so they must also be discussed. At the same time, emotions are clearly related to the objective function, therefore their purpose (unlike other elements of the cognitive architecture) cannot be reduced entirely to saving resources and reducing study time, for which search heuristics and inductive bias are used.

We have already briefly discussed the problem of the objective function in previous articles. A “good” objective function (for example, at least accurately evaluating survival) cannot be given a priori. The innate objective function is a rough “heuristic” approximation of some “true” objective function. For example, pain and pleasure are a very rough approximation of the function of fitness - death can be painless, and life-saving operation can be accompanied by severe pain. Emotions and other components of assessing the quality of a situation allow for a more accurate approximation. Some of them are congenital. Others are acquired during life.

In this case, it is necessary to separate the heuristics of the approximation of the true objective function and the assessment of the quality of states, taking into account the potential values of the objective function associated with the expected (predicted) states. So, we can avoid those situations that we fear, without thinking every time about the causes of fear. And these are already ordinary heuristics that do not define a maximized objective function, but reduce the search for possible actions with a fixed objective function. For example, the pleasure of curiosity and aesthetic pleasure can be introduced as separate components of the basic objective function (for which there are models based on algorithmic information theory [Schmidhuber, 2010]). Or an intelligent agent may be curious if he is able to predict from experiencethat obtaining new information will be useful for its survival (more precisely, for obtaining bodily pleasure and avoiding pain). Since this is a difficult task of prediction, an agent can develop a “sense of curiosity” as an element of a continuous estimate of future rewards to save computing resources. It should be emphasized that these two options are fundamentally different, since they correspond to different maximized objective functions, which is why the corresponding agents in some situations can make different choices. At the same time, both of these options as well as their combination can actually take place.An agent can develop a “sense of curiosity” as an element of a continuous estimate of future remuneration to save computing resources. It should be emphasized that these two options are fundamentally different, since they correspond to different maximized objective functions, which is why the corresponding agents in some situations can make different choices. At the same time, both of these options as well as their combination can actually take place.An agent can develop a “sense of curiosity” as an element of a continuous estimate of future remuneration to save computing resources. It should be emphasized that these two options are fundamentally different, since they correspond to different maximized objective functions, which is why the corresponding agents in some situations can make different choices. At the same time, both of these options as well as their combination can actually take place.

Since human intelligence is pragmatic and effective, its total future reinforcement is predicted primarily without an explicit reference to the basic objective function. Because of this, the training of the objective function itself is closely intertwined with the training of heuristics to predict its future values. At the same time, the corresponding learning mechanisms have their own inductive bias, including in the form of some a priori representations. For example, for some emotions, there may be no inherent mechanisms for “calculating” their values, but selected types of emotions may be defined at the level of representations. In connection with all this with respect to human emotions, feelings, etc. it is difficult to determine to what extent which of them relates to the components of the basic objective function, and to what extent to the heuristics of predicting its future values.For this reason, in psychology, it is still not possible to reach a common opinion about the mechanisms of emotions, their role and origin. Nevertheless, this whole part of the cognitive system of natural intelligence is quite interpretable in terms of models of universal intelligence, including in terms of increasing their effectiveness.

Attention.

Such cognitive function as attention is a very wide phenomenon. However, it is quite obvious that its occurrence is due to resource constraints. For example, visual attention is directed to the most informative or meaningful (in terms of the objective function) parts of the scene, which means that these parts are analyzed in detail using more resources than other parts. Naturally, the allocation of resources in solving other cognitive tasks can also be interpreted as attention.

This thesis can be extended to multi-agent architectures. It is hardly possible to believe that intellect, in principle, is not capable of solving many problems in parallel, while maintaining some kind of unity. At least, it is possible for THEM (and does not threaten schizophrenia), because THEM can work with any number of different data sources and conduct induction and choice of actions simultaneously in so many separate “bodies” as necessary. If different data sources contain mutual information, this will be “automatically” taken into account. That is, with unlimited resources, the intellect processing data coming from different bodies does not need to focus on any one of them. The phenomenon of attention occurs when the introduction of restrictions on resources, which implies the processing primarily of those pieces of datawhich are relevant to the most pressing tasks, given the limited set of actions available.

It is worth noting another side of the phenomenon of attention: it can also be interpreted as the focus of action on a particular object. Such “external” attention is due to the distribution of time between not “internal” computational operations, but external actions. Such “external” attention in the IMI should be realized “automatically”: a universal agent should be fully capable of, say, directing the camera towards a sharp sound to obtain information essential to avoid a strong reduction in the value of its objective function (of course, if this agent has a priori information indicating the possible connection of a loud sound with a danger). There is no internal attention as the allocation of limited resources in THEM, so that information about how human attention works,It may be useful to introduce this element of cognitive bias.

There are many models of attention for cognitive architectures (for example, [Ikle 'et al., 2009], [Harati Zadeh et al., 2008]). It can be said that attention mechanisms are present even in simple universal solvers (for example, [Hutter, 2002]), which take computational complexity into account and try to allocate resources optimally between the different hypotheses under consideration. Naturally, more developed attention mechanisms must be present in an effective pragmatic AI. But the details of these mechanisms essentially depend on other parts of the cognitive architecture. Thus, meaningful models of attention should be developed in conjunction with resource-limited extensions of IMI models.

Metacognitive functions.

It is often believed that the main thing that separates a computer from a person is the lack of such functions as self-awareness, understanding, etc. in the first one. This opinion is peculiar not only to people far from AI, but also to people who are engaged in it (at least in a philosophical sense). Even a strong AI Pearl was defined as an AI possessing all such functions. And the impossibility of true understanding is what is attributed to Penrose’s computers and other proponents of the impossibility of a strong AI.

Many experts, who are not limited to general reasoning about AI, but are engaged in the development of specific solutions in this area, see much more serious difficulties, for example, in the problems of search, training, knowledge representation, etc., while these “human” functions do not consider so complicated. Thus, self-awareness is interpreted simply as a top-level control module that receives and processes information about the operation of other blocks of a cognitive architecture. It is clear that such metacognitive functions are not fully realized without the intellect itself. Then the computer is not endowed with self-awareness, not because it is something mysterious and inherent only to man, but because more basic functions are not implemented. Because of this, technicians often shy away from these aspects of thinking, considering them “humanitarian” and, in contrast to philosophers, interpreting them too simplistic. However, metacognitive functions are beginning to attract increasing attention [Anderson and Oates, 2007] and are even implemented in some form in some cognitive architectures [Shapiro et al., 2007] (although these implementations are quite interesting and informative, in our opinion they are “weak”). It is impossible to completely circumvent their discussion in the context of a conversation about universal intelligence.

Indeed, in the models of THEM in an explicit form, neither self-awareness nor understanding is realized, which raises the natural question of whether something important is missing in these models. From an analysis of a number of metacognitive functions (meta-learning, meta-reasoning) it is clear [Anderson and Oates, 2007] that their purpose is related to compensating for the non-optimal performance of basic cognitive functions. At the same time, the cause of such errors, say, training, which can be corrected by the agent itself, can only be related to the fact that not enough resources were allocated to solve the corresponding training task. After all, when using universal induction with unlimited resources, the result in principle cannot be improved on the same data, and meta-learning is meaningless. Of course, metacognitive functions are not reduced merely to the redistribution of resources (this is only a private trick, which is the prerogative of attention). Thus, in the case of learning, resource savings can be manifested in the use of only a part of the data, ignoring the context, the use of simplified representations, etc. And meta-learning should not be concerned with detaining more resources for the universal learning method, but by evaluating the success of the learning unit and involving, for example, more general methods when simpler methods fail. Introduced even the concept of so-called. a metacognitive cycle, in which “what went wrong and why” should be defined [Shapiro and Göker, 2008].

This interpretation of metacognitive functions is too general. Concerning specific functions, questions arise. Thus, understanding (which, however, is not always interpreted as a metacognitive function, but which nevertheless, in our opinion, has undoubted attributes of such functions) is not so easy to associate with “cognitive bias”. There are many examples showing that particular systems of (weak) AI do not realize understanding. But these examples do not indicate the fundamental impossibility of machine understanding, but just allow us to determine the role of understanding in saving resources. We have already considered a classic example of a chess position ( http://habrahabr.ru/post/150056/ ), in which a computer program that can beat a grandmaster plays incorrectly because of a lack of understanding of this position. With unlimited computing resources as a result of deep brute force, the program could avoid an erroneous move. Moreover, it is possible that this (algorithmic) description of this situation (for example, in the form of an estimating function) allows one to determine the error of the corresponding move without going over. That is, the understanding of the situation is connected with the use of such a representation of it, which allows you to choose effective actions without the high costs of computing resources.

A similar conclusion can be made when using other examples. So, the following classical problem is indicative. There is a board of 8x8 cells, from which two corner cells are cut, which are located on the same diagonal. It is required to pave the board with the help of 1x2 domino knuckles. Ineffective intelligence (not understanding, but having unlimited resources) could go through all the options of tiling. A person experiences the effect of understanding when he imagines that this board has a chess coloring, so that there are 32 cells of the same color and 30 cells of a different color on it, with each knuckle necessarily taking one square of different colors. Choosing the right view makes the task elementary. Perhaps even more indicative are such tasks for “creative thinking” as the problem of constructing four equilateral triangles with the help of six matches. Here the choice of presenting the situation is also of fundamental importance.

And the understanding of images is connected with the construction of their descriptions within the framework of certain representations (which, as a rule, should facilitate the accomplishment of adequate actions). Apparently, the same can be said about the understanding of natural language, although it also includes additional issues.

Perhaps understanding is not the very use of effective representations, but a metacognitive function that gives (accessible to consciousness) an assessment of the effectiveness of representations. If a person cannot understand something or does not understand well enough, he often (although not always) is aware of this; as well as the person has access to the feeling of achieving a clear understanding, which probably should be related to the problems of self-optimization.

Access to the inner content of thinking processes is characteristic of all metacognitive functions, which is integrally expressed in the phenomenon of self-consciousness. There is nothing in them like that in explicit form (because they do not need control of their own thoughts, they are perfect), but this does not mean that he will not be able to behave as a self-aware agent. But can he correctly use such expressions as “I think,” “I suppose,” “I know,” “I can,” “I want,” “I remember,” etc., if there are methods of choice in it actions do not receive any information about their own work (and the use of such expressions may be important for survival in the existing multi-agent environment)? Definitely answer this question is not easy. It is possible that IMI can correctly (pragmatically) use these expressions without understanding their meaning, but this will require an extremely extensive experience of interaction with the social environment and, of course, unlimited computing resources. After all, the pronouncing of words is not fundamentally different from any motor output, and if there is a computable mapping between input influences and the required output influences, then THEM can reconstruct it by a suitable interaction history. Nevertheless, the possibility of an “unconscious” use of actions that require the lack of introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization.

Use of this information is non-trivial. The placement of THEM in an environment that includes other THEM causes a contradiction (one agent models another agent, which in turn models the first agent, and so on ad infinitum). Complete introspection would cause a similar contradiction. This contradiction is removed along with the introduction of resource constraints, which, however, violate the abstract ideal intellectuality of THEM. It means that the problem of introspection (and in general the problem of “theory of mind”) is not solved within the framework of THEM and requires the development of additional principles. And although the problematic of “theory of mind” (and metacognitive functions in general) is associated with “cognitive bias” (especially in terms of self-optimization heuristics), it may also be related to the lack of universality of the basic IMI models.

Findings.

We have analyzed some cognitive features of human thinking, which can be quite naturally interpreted as heuristics and inductive bias, ensuring effective pragmatic nature of natural intelligence, that is, its acceptable work in a certain class of environments in conditions of limited resources and learning time.

In general, limited resource requirements are not new; and it is quite obvious that many cognitive features come from here. However, until now a non-trivial consideration of the connection between the mathematical theory of universal AI and complex cognitive architectures has not been conducted [Goertzel and Iklé, 2011]. To establish such a relationship, it is necessary not only to superficially describe cognitive functions, but to strictly introduce them as an extension of their models, while preserving the universality that these models possess. It is this task that we will discuss in the future.

Literature.

(Anderson and Oates, 2007) Anderson ML, Oates T. A Review of Recent Research in Metareasoning and Metalearning // AI Magazine. 2007. V. 28. No. 1. P. 7–16.

(Bushinsky, 2009) Bushinsky Sh. Deus Ex Machina - A Higher Creative Species In The Game Of Chess // AI Magazine. 2009. V. 30. No. 3. P. 63–70.

(Dowe et al., 2011) Dowe D., Hernández-Orallo J., Das P. Compression and Intelligence: Social Environments and Communication // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th Int'l Conference) . 2011. P. 204–211.

(Gobet and Lane, 2010) Gobet F., Lane PCR The CHREST Architecture of Cognition. General Intelligence Role of Perception // E.Baum, M.Hutter, E.Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. On Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010.). P. 7–12.

(Goertzel, 2009) Goertzel B. The Embodied Communication Prior // In: Yingxu Wang and George Baciu (Eds.). Proc. of ICCI-09, Hong Kong. 2009

(Goertzel, 2010) Goaertzel B. Toward a Formal Characterization of Real-World General Intelligence // E.Baum, M.Hutter, E.Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. On Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010.). P. 19–24.

(Goertzel and Iklé, 2011) Goertzel B., Iklé M. Three Hypotheses // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th Int'l Conference). 2011. P. 340–345.

(Harati Zadeh et al., 2008) Harati Zadeh, S., Bagheri Shouraki, S., Halavati, R .: Using the Decision Mechanism // Frontiers in Artificial Intelligence and Applications (Proc. 1st AGI Conference) . 2008. V. 171. P. 374–385.

(Hutter, 2002) Hutter M. The Fastest and Shortest Algorithm for All Well-Defined Problems // International Journal of Foundations of Computer Science. 2002. V. 13. No. 3. P. 431-443.

(Hutter, 2007) Hutter M. Universal Algorithmic Intelligence: A Mathematical Top → Down Approach // In: Artificial General Intelligence. Cognitive Technologies, B. Goertzel and C. Pennachin (Eds.). Springer. 2007. p. 227–290.

(Ikle 'et al., 2009) Ikle' M., Pitt J., Goertzel B., Sellman G. Economic Attention Networks: Associate Memory and Resource Allocation for General Intelligence // In: B. Goertzel, P. Hitzler, M Hutter (Eds), Advances in Intelligent Systems Research. 2009. V. 8 (Proc. 2nd Conf. On Artificial General Intelligence, Arlington, USA, March 6-9, 2009). P. 73–78.

(Pavel et al., 2007) Pavel A., Vasile C., Buiu C. Cognitive system for an ecological mobile robot // Proc. 13 Int'l Symp. on System Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation. 2007. V. 1. P. 267–272.

(Potapov and Rozhkov, 2012) Potapov AS Visual, Stimuli. 2012. (in print)

(Potapov et al., 2010) Potapov AS, Malyshev IA, Puysha AE, Averkin AN. SPIE. 2010. V. 7696. P. 769606.

(Potapov, 2012) Popepov AS Principle of Representational Minimum Description Length in Image Analysis and Pattern Recognition // Pattern Recognition and Image Analysis. 2012. V. 22. No. 1. P. 82–91.

(Schmidhuber, 2010) Schmidhuber J. Artificial Scientists & Artists Based on the Formal Theory of Creativity // In: E.Baum, M.Hutter, E.Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. On Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010). P. 145–150.

(Senator, 2011) Senator TE Transfer Learning Progress and Potential // AI Magazine. 2011. Vol. 32. No. 1. P. 84–86.

(Shapiro et al., 2007) Shapiro SC, Rapaport WJ, Kandefer M., Johnson FL, Goldfain A. Metacognition in SNePS // AI Magazine. 2007. Vol. 28. No. 1. P. 17–31.

(Shapiro and Göker, 2008) Shapiro D. and Göker MH Advancing for Witness and Why // AI Magazine. 2008. V. 29. No. 2. P. 9–10.

Source: https://habr.com/ru/post/151838/

All Articles