"Unbiased" universal algorithmic intelligence

Formulation of the problem

In previous articles “Basics of the approach to building universal intelligence”, part 1 ( http://habrahabr.ru/post/145309/ ) and part 2 ( http://habrahabr.ru/post/145467/ ), we are in general terms described various existing approaches and formulated some methodological principles that are appropriate to implement when developing a universal AI. The article “The ideal student, or what is hidden in machine learning” ( http://habrahabr.ru/post/148002/ ), the need to comply with these principles (and, in particular, the preservation of universality) was discussed on the example of machine learning. Here we will examine one common model of universal intelligence as a whole. Although this model is extremely far from the real universal AI, it allows you to understand the critical flaws of other approaches.

Let us try to consider the model of ideal minimal intelligence (IMI) as a rational agent, trying, in a certain world, to maximize its objective function (utility function). Here we will completely neglect the required computational resources, and try to see if it is possible to build a universal AI at least in such conditions. Practically without loss of generality, we can assume that a finite set of elementary actions and sensor readings is available to an agent at discrete points in time. A set of sensors includes both external and internal, including the objective function. Virtually any particular or general task in the field of AI can be presented in this form by properly selecting “sensors”, “effectors” and “objective function” (although this choice is not always obvious and natural).

So, a robot that solves a problem in production, or a program that plays chess, is quite transparent in this form. Regarding agents with general intelligence (such as humans), it is more difficult to make a similar conclusion. Nevertheless, a person “only” receives sensory information and performs some actions. Nothing beyond this is really available to him. Does it optimize some target function? From an evolutionary point of view, we can say that yes: it is a function of fitness. Let the person do it and not consciously (and the function itself is not explicitly specified), but those organisms, the choice of actions that are not coordinated with this function, simply did not survive or leave no offspring.

Of course, one can also say that it is possible to increase survival in many different ways, including non-intellectual ones, for example, by improving the body, its “physical” characteristics. Some species successfully survive for many millions of years without using developed intelligence, that is, intelligence is not very necessary for completely successful optimization of their fitness function. This applies not only to the fitness function, but also to any other objective function: for example, when creating a recognition system for some objects, you can use simple non-specialized sensors and intelligent algorithms for processing touch information, or you can use complex specialized sensors that allow you to perform recognition with elementary algorithms. Thus, the intellect, although it does not contradict such a statement (optimization of the objective function), but is not reduced to it. Without going into a dispute as to whether such a point of view is fair, we simply note that, nevertheless, a choice of actions is required of an intelligent agent, and for any particular method of choice you can choose a certain objective function. In this sense, we can assume that intelligence optimizes a certain function for which fitness is one of the components or a special case. The question here is not whether such a presentation is in principle permissible, but how comfortable it is (and this question is not idle with respect to universal intelligence).
')
Another doubt may be that the task of a limited list of sensors and effectors for universal intelligence looks like an unreasonable restriction. Even for a person (whose intelligence is still quite specialized, and not absolutely universal), this restriction is not so severe. A person very widely uses tools to expand the set of his sensors and effectors, which he partially interprets as an extension of his body. Moreover, even at the level of the brain, it is possible to reorient cortical areas to process information of a different modality. At the same time, however, the number of “inputs” and “exits” of the brain does not change at all. All changes are due to the structure or context of the relevant information or actions, which is not prohibited in the statement of the problem for a universal AI. Of course, allocating AI with the ability to directly connect tools as sensors and effectors of new modalities may be useful, but it is hardly essential, at least for the current stage of analysis.

Another point is that when considering a fully embodied AI, interaction with the world may not be limited only to the “standard interfaces” of the agent's body - sensors, target function and effectors. Outside, arbitrary effects can also be exerted on the intelligent agent program itself. Sometimes this is seen as a significant moment [Ring and Orseau, 2011]. However, for a person such a situation means direct intervention in the brain, which is usually not informational, but physical (or chemical) in nature, and the human mind is very vulnerable to such interventions. Of course, it is possible to raise the question of creating a universal AI, which would be more protected from such effects, but in general this question does not seem to be a priority.

Thus, one of the most common (although perhaps not the most successful) is the following formulation of the problem for universal intelligence. At the input of the agent, at each moment of time t, the sensor input is given the values of o _t belonging to a certain set O, which may have a certain structure; from the medium through the "body" comes the scalar values of reinforcement r _t , which belong to the specified range R = (r _min , r _max ). For brevity, we denote the pair (o _t , r _t ) x _t , x _t ∈ X = O × R. The agent also performs actions y _t (belonging to a certain set Y) so as to maximize the total value of the objective function in the future

where k is the current time. Note that here we will use the notation, mainly borrowed from [Hutter, 2005], since the model of the universal AI presented in this paper is the most widely known and can be considered as a zero approximation to the real strong AI.

In this formulation (1), the task is obviously not correctly posed: the agent must maximize the values of r _t for t> k, about which nothing is said in the statement of the problem. In other words, it is necessary to introduce at least some assumptions about the world, which would allow to establish a connection between the previous values of o _t , r _t , y _t and their future values.

The case of the known computable world

We first consider the simplest case of a medium completely described by a well-known Turing machine (algorithm or program for a universal Turing machine) q ', whose input at time k is the actions of the agent y ₁ , ..., y _k , and the output is the values o _k and r _k . The agent itself is also controlled by some program p ', whose input is o ₁ r ₁ , ... o _{k – 1} r _{k – 1} , i.e. x ₁ , ... x _{k – 1} , and the output is the value of y _k .

For brevity, a sequence of the form y _m ... y _n will be denoted by y _{m: n} , and for m = 1 we will write y _≤n or y _{<n + 1} . To designate a set of sequences composed of objects of the set Y, the traditional designation Y * will be used. Then the programs q 'and p' define the corresponding maps q ′: Y * → X and p ′: X * → Y, with x _k = q '(y _≤k ) and y _k = p' (x _<k ). We will also write o _k = q _o '(y _≤k ) and r _k = q _r ' (y _{≤ k} ) for the corresponding components x _k . Here it is assumed that on each clock of the program, the programs are started in turn: first, p 'starts, and then q', and at the initial moment of time the string x _<1 is empty. There is no principal asymmetry between the medium and the agent (with the exception of the initial moment of time), since the starts p 'and q' constantly alternate, and the assignment of two starts to one cycle is conditional. Here, of course, the computation time of the p 'and q' is not taken into account, but otherwise this statement looks natural enough. It is necessary to emphasize that there is no “real time” here in principle: these programs do not work constantly, but are called up on each clock cycle again. Then the current result of the interaction of the agent with the world can be calculated in a loop:

for t from 0 to k: y _t = p '(x _<t ), x _t = q' (y _≤t ).

To get rid of the asymmetry (or rather, from half-cycles), one could consider x _t as q '(y _<t ). From a formal point of view, this will lead to the formulation of another task. Thus, the original formulation with x _t = q '(y _≤t ) is suitable for describing the game of chess, but not the game “rock-paper-scissors”, whereas the variant with x _t = q' (y _<t ) is vice versa . However, since both of these options do not take into account the need to calculate q 'in real-time, with the introduction of which the differences between them will be eliminated, at this stage there is almost no difference which one to use except for artificial cases such as these games, whose rules determine the specific organization of the sequence “Moves” of the agent and the environment (it is also worth noting the incorrectness of the variant y _t = p '(x _{≤ t} ), x _t = q' (y _{≤ t} )). Naturally, in a real game “stone-scissors-paper” there is no absolute simultaneity of moves, but whether it is at the level of elementary actions is practically not important (here you can think of a really working robot that wins this game in humans by recognizing the choice person to designate their own choice). Also, in a real game of chess, the sequence of moves can be “physically” easily broken.

For convenience, we also introduce programs p and q, such that x _{1: k} = q (y _≤k ) and y _{1: k} = p (x _<k ), that is, p is a program that not only chooses the current actions, but returns their whole story. Equivalent programs p and p '(q and q') can be easily obtained from each other, so their introduction here is nothing more than a matter of convenience. Since the generated strings of x and y depend on both programs p and q due to the recursiveness of calls q (y _{≤ k} ) = q (p (x _<k )) = q (p (q (y _<k ))), a specific element stories formed by p and q will be denoted by, for example, r _t ^pq , y _{1: k} ^pq , etc.

Since p and q are deterministic, it is possible for them to calculate the future (after the current moment k up to some moment m) the total gain that the agent will receive:

Now we can correctly set the task of determining the best p * policy and the best current action:

Solving the problem of finding the optimal strategy p * for a specific environment q can be carried out a priori using formula (2), although building such an optimal intelligent agent can be quite nontrivial, since it is usually impossible to directly apply this formula in practice for reasons of computational complexity.

It seems that the p * programs for different environments should be fundamentally different. In this regard, it is quite noteworthy that there is a universal program for the selection of optimal actions on the environment model q. Actually, equation (2) and sets such a program. But you can offer another universal program that selects the optimal actions directly, and not through the search for a particular optimal program. Such a program is defined by the following expression:

Indeed, if a well-known environment program receives at the input a chain of agent's actions, then it is possible to search not the program p generating the optimal chain, but the chain itself, and such a search will just be a universal program. Since the agent program itself does not depend on the q program of the environment, but uses it as input data, it looks universal, and it seems that a direct search with unlimited resources should obviously lead to maximization of q. However, let's discuss the adequacy of the statement of the problem itself, having a model of universal intelligence, at least for a known environment.

Adequacy of setting the objective function maximization problem

One of the obvious questions that arise here is the choice of the time interval (or rather, its upper limit) over which to sum. This question, bypassed by writing equations (1) - (3), has no obvious answer and requires analysis. Let's look at a traditional example of a chess position.

The idea here is that eating white black rooks will lead to loss. A draw is achieved only in the limit. No finite search depth will allow the reorientation algorithm to “understand” that a permanent refusal to eat a rook is equivalent to a draw. This, however, will not prevent the agent from acting adequately in this situation: after all, with sufficient depth of the search, eating will result in a finite number of moves to lose, therefore the rating of eating the rook will be lower than any other (though the agent will not “understand” the equivalence long non-eating and draw). In addition, in a practical sense, an agent cannot exist forever; therefore, an infinitely long game is nothing more than an abstraction. The lifetime of the Universe itself is finite, and it is difficult to imagine a realistic situation in which the incarnated agent should have an unbounded depth of search.

At the same time, with the concept of infinity as an element of some symbolic representation, such an agent is also quite able to work. And since in humans, symbolic representations have a semantic basis in sensory and motor skills, that is, they are simply some combination of x and y, this should in principle be available (although not explicitly) to the agent under consideration. Nevertheless, it is quite possible that the concept of infinity has some kind of distinguished semantic basis, which makes reasoning about infinite (in particular, cyclic) processes for a person much more natural than for a selector algorithm. In addition, non-stop processes are related to the extension of the classical concept of computability (the so-called computability in the limit), so they can be an important component of the theory of universal intelligence, as noted, for example, in [Schmidhuber, 2003]. However, we are now only interested in whether our agent will act adequately, and there are no examples of obviously inappropriate choice of actions.

Returning to the question of the range of time of summation, it is worth noting that the very acceptance of the possibility of limiting this range does not give an answer to the question of how it should be limited. In particular, the restriction of this range to the estimated lifetime of the agent leads to inadequate (or rather, undesirable for people) behavior. For example, such an agent at the price of his life will not save anyone and in general will inevitably act according to the principle “after us even a flood”. Naturally, when creating a strong AI, it will be necessary to solve the problem of choosing an appropriate summation range or, more generally, a desirable objective function that will evaluate not only the personal benefit of the agent. Otherwise, the extrapolation of this function to the moments of time after the death of the agent will always give the minimum values. But it turns out that the objective function must continue to “exist” in the absence of the agent itself.

On the example of the minimum intelligent agent, the difficulty of setting the objective function is particularly well seen. The agent himself is guided only by “raw data” and chooses chains of elementary actions. At the same time, he does not use any concepts explicitly, which does not prevent him from acting adequately due to exhaustive search. But we need to set the target function, which, just, and is not expressed in the readings of simple sensors. To calculate such a function, a separate intellect is needed, and not universal, but specialized. Indeed, in the case of simple worlds, such as chess, the assignment of a true objective function determined by the rules of the game is quite possible; but if we imagine the real world, even for which we have an exact physical model that allows us to determine the state of the world at any time in the future, it will be very difficult to translate information about this state into the values of the objective function, determined at least simply by the survival of the agent himself.

If the objective function is limited only by the readings of simple sensors, in particular, by pain and pleasure, then one should expect selfish behavior. You can, of course, “educate” this agent by reinforcement and punishment. As long as the agent does not have the means to eliminate the punishment imposed by people or other intellectual agents, he will act in the desired way. However, as soon as such an agent can predict the absence of negative consequences for himself, he will act unethically. The idea of “educating” the AI as a whole is correct, but in order for it to work, it is necessary that this upbringing not only builds on the existing personal “bodily” objective function, but also changes it.

Even if we ignore the problem of ensuring the "friendliness" of AI, and simply confront it with the task of our own survival, maximizing the a priori "bodily" objective function does not fully satisfy this task. Indeed, pain and pleasure are only heuristic approximations of the evolutionary function of fitness. They are akin to assessing the quality of chess positions based on the strength of the pieces. At the same time, in life, unlike chess, the true objective function of survival is not given - we cannot a priori determine in which situation we are alive or dead. If, in chess, too, one is guided only by a rough “bodily” assessment, then the level of the game will be very far from optimal. Also, animals (as, indeed, to a certain extent, people) can maximize the “bodily” objective function to the detriment of survival. This is most clearly seen in experiments in which electrodes were implanted into centers in rats
pleasures so that animals could activate them on their own. Since the objective function determines only the current quality of the state of the organism, its extrapolation cannot explicitly predict death. The action of the electrode increases the value of the target function is irrelevant to the true state, and therefore the behavior of the animal is far from optimal. Such, under certain conditions, may also be the behavior of “universal” artificial intelligence, which is tasked with maximizing the inherent objective function.

There are attempts to propose such innate objective functions that would not lead to degenerate behavior (Ring and Orseau, 2011). You can consider the hypothetical reactions of agents with different objective functions to various dilemmas. Let's say whether an agent agrees to unlimited enjoyment in virtual reality, from which he cannot leave until the end of his life? If we recall the possibility of modifying the intelligence of the agent itself, which was unaccounted for in the simplest formulation, then we can put an even tougher dilemma before it: unlimited enjoyment at the cost of maximally simplifying the agent’s intelligence. It is quite obvious that an intelligent agent, maximizing a purely “bodily” objective function, will make a choice that not all people would agree to. At the same time, the objective function can be built on the basis of the work of the intellect itself, for example, by taking into account the diversity of the information received. There is an opinion [Orseau and Ring, 2011] that agents with similar motivation will not be subject to the “wrong” choice in the case of such dilemmas, despite the fact that they will strive for survival as a necessary condition for the realization of cognition. From the traditional evolutionary point of view, however, rather, on the contrary, the desire for knowledge is a heuristic in the objective function of survival. As you can see, the analysis of these problems requires more detailed models of universal intelligence.

Thus, the model of maximization of a given objective function is not quite universal; or rather, it can be considered a model of universal intelligence, whose task is limited only to the implementation of this maximization on a fixed time interval. But it cannot be considered as a model of an intelligent agent as a whole, of which the objective function itself is a part, and the task of which is
minimum survival. In the end, we are interested, of course, in the creation of an intellectual agent (and not only in survival). However, for the time being we will limit ourselves to considering models of pure intellect only.

Indefinite environment

The case of a well-known deterministic environment with a given objective function is banal. In fact, the whole field of AI began with the consideration of this case, when the labyrinth thinking hypothesis was expressed. And, of course, the researchers very quickly, from a general formulation of the problem, moved on to the question of how to solve it in practice under conditions of limited computing resources. At the same time, the determination of the environment means the non-autonomy and non-universality of the corresponding intellectual programs. So, it is possible to create programs that are more effective than a person in each of these limited worlds, but this does not allow the computer to solve new problems on its own. It is quite natural to immediately consider the general case, including an arbitrary and indefinite environment. Of course, one may ask: is it worth undertaking a more complex task when there is no complete solution for a simpler problem of a deterministic environment (with a limit on resources)? But in the history of science there are many examples where the solution to a simpler, particular problem could not be obtained until a more general problem was solved. Here, consideration of the problem statement for a more universal intelligent agent, if it does not lead to its complete solution, will allow us to better understand the problem of AI as a whole and facilitate the solution of a little more particular problems.

What does our lack of complete information about the environment mean? This means that we do not know exactly the algorithm of the environment q. But what does the agent know about the environment? Let there be some set of algorithms, each of which may be true (this set can be algorithmically complete). And let the agent be given a distribution of a priori probabilities μ (q).

The introduction of such a distribution may seem strange. After all, the considered algorithmic models are deterministic and either correspond to the environment or not. However, the least contradictory is not a statistical, but an informational interpretation of probability: the probability of an event, models, etc. attributed insofar as we do not have complete information for their prediction or selection. Here we just assume incomplete information about the environment model, and this incompleteness is flexibly described by a probability distribution. However, it should be emphasized that the distribution of μ (q) cannot be considered “true”; it is more correct to think of it as the best possible with the available a priori information. Although in some conditions this probability distribution can be given a completely statistical meaning. For example, if the “environment” is an opponent in a particular game of chess, then a priori probabilities can denote, for example, the frequency of occurrence of opponents ideally playing on the minimax principle, professionals, beginners, etc. In this case, it can be assumed that there is a certain general population of adversaries, the “environments”, which is described by the “true” distribution μ (q), although a certain proportion of convention remains here.

Having a probability distribution μ (q), it is easy to modify the formulation of the problem for an agent as maximizing the expectation of the expected reinforcement:

Again, this expression is not original and is given here in accordance with, for example, [Hutter, 2005]. The optimal agent model will then be defined as

It would seem that this expression is entered correctly. It is interesting, however, that the program p * cannot be chosen a priori in accordance with it and left unchanged.As was emphasized above, we are now considering deterministic media, so the distribution μ (q) cannot make sense of the true stochastic model of the environment. Instead, μ (q) is a priori the best indefinite model of a deterministic environment.

, ( , ), – P(0)=P(1)=0.5. , . , . , : , , .

But if we have some q that do not agree with the existing history, how can the inequality μ (q) ≠ 0 hold for them? Obviously not. Therefore, the distribution μ (q) and the mathematical expectation of reinforcement with the existing history of the interaction of the agent with the medium should be recalculated. For undefined deterministic models, one can adopt a simple method of recalculation for the moment k:

where C _k is a constant (for the existing history at the moment k), providing the normalization μ _k(q). Since the environment models are deterministic, they do not smoothly change their probability, for example, according to Bayes' rule, but are simply eliminated. Does the use of deterministic programs p and q impose any restrictions on the versatility of an intelligent agent? This question is ambiguous, since the very presence of true chance in the real world is debatable. And despite the fact that the least controversial formalization of the concept of probability is achieved in the framework of the algorithmic information theory, in the zero approximation it can be considered that deterministic models are sufficient. To this question, however, we will return below.

Now it is easy to clarify the mathematical expectation of reinforcement, taking into account the available history and the optimal strategy at the moment k:

, p*, (2) k=1. , : , μ(q) () , , p ₁ ^* , (4), k >1. , , (4), , p ₁ ^*which remains optimal for k> 1. It may seem that there is some kind of contradiction. However, it is not really. There are many algorithms that, for the current moment of history and a given distribution of μ (q), give the maximum value of the expectation of future reinforcement. Which one of them is returned by argmax in (4) is not specified. And this algorithm may well be suboptimal when new data is received. However, among all optimal algorithms, there are universal algorithms, in particular, performing the search for other optimal algorithms according to (4).

Naturally, there is a universal algorithm for selecting the optimal action, which goes through not the algorithms for generating actions, but the immediate chain of actions:

, , ( ) k.

(4), (5). , , , , , .

Equations (4) and (5) are equivalent to similar equations in [Hutter, 2005]. Here we give them without clarification, although we believe they are not entirely correct. The fact is that according to (4) it is also impossible to choose the optimal program p _k ^* , as it was impossible a priori to choose a constant p ₁ ^*. These equations do not take into account the process of changing the distribution of μ (q) as information accumulates. Depending on the choice of action, the distribution μ (q) may remain unchanged or narrow. Exploratory behavior implies a choice of actions aimed at obtaining information that minimize uncertainty as much as possible. It should be expected that exploratory behavior will not be attributed to an agent acting in accordance with (4) and (5). But here we will continue our presentation on [Hutter, 2005].

. , . , . , , . , , . , ( « »). , , , , ( , , , ). () , .

In equation (5) compared with (3), summation over all given media models is added, which makes the task of choosing the optimal action computationally even more difficult, but this is necessary for the versatility of the agent. At this stage of consideration it is more important that the a priori distribution for the real world is inaccessible.

The problem of universal prediction

How to build a model of a universal intelligent agent, if not given μ (q)? It must be remembered that by μ (q) we mean not the true, but some better (taking into account the available prior information) distribution. But in reality, this distribution is extremely problematic to build. What distribution should be taken with a minimum of a priori information?

ξ. , q ξ(q)≠0, . , ξ(q)=const, , , ξ(q) – () . ξ(q) , , (), .

An adequate solution is given in the framework of algorithmic information theory, in which not the amount of information is determined by probability, but probability is determined by the amount of information, which, according to A.N. Kolmogorov, must have a purely combinatorial basis. As a result, ξ (q) = 2 ^{–l (q)} is introduced , where l (q) is the number of binary symbols in the program record q. When using prefix codes or other codes with self-delimitation, the total probability 2 ^{–l (q) of} all bit strings (programs) is equal to 1.

This definition is very reasonable: the probability and the amount of information are related by the specified ratio, and the amount of information in the model is naturally determined by the number of bits in its description. The latter, of course, can hardly be justified absolutely strictly; rather, this definition has the character of an axiom, which, however, proves its complete adequacy to reality. One could stop at this and simply use ξ (q) instead of μ (q), but it is necessary to clarify that the significance of the universal distribution ξ (q) goes far beyond its use in (4) and (5).

In the field of machine learning, induction and prediction, the problem of a priori probabilities is fundamental. Its ignoring or incomplete solution leads to the problem of retraining, which is characteristic of most weak machine learning methods, including most types of artificial neural networks. In the framework of the statistical approach, a priori probabilities when solving a problem can be determined only by a sample of problems of this class, which should already be solved. But even in this case, to estimate the a priori probability distribution, given in some form, a priori probabilities are necessary for the probability distributions themselves. This is a vicious circle from which it is impossible in principle to get out using statistical methods. As a result, the prior distribution, which specifies the (meta) model, is introduced by the heuristically developer.

A similar problem is also seen in the classical information theory: in it the amount of information in the message is entered as the length of the optimal code with the known (statistical) model of the source of the messages. Under such conditions, it is impossible to determine either the individual probability or the amount of information, for example, in a binary string under the conditions of a priori uncertainty of its source model.

With such uncertainty, it is necessary to search for the optimal model of the source of information in the space of all possible models (which is the space of algorithms). But if knowledge of the optimal model allows for the most efficient compression, then reversing the problem of optimal coding should allow us to find the best model. In other words, if we need to transfer data encoded in the most compact way, then the universal way to do this is to transmit the shortest program that reproduces this data. The length of this program will be the amount of information in this data, or, more precisely, their Kolmogorov (algorithmic) complexity, and the probability ξ (q) of this program can be attributed to the data. However, given that the same data can be generated by different programs,The a priori (algorithmic) probability of the corresponding data line is more correctly defined as the sum of the probabilities of all these programs. Thus, the algorithmic complexity and algorithmic probability are of the form:

Λ – .

, [Hutter, 2005] [Hutter, 2007] ξ(q)=2 ^–l(q) P _ALP (x) . , , .

, «» : - , , . [Solomonoff, 1986] . , x x',

, P _ALP (x) , x, , x . P _ALP(xx ') is the probability of spawning a string starting with xx' concatenation.

It is important to emphasize here that the universal probabilities that solve the problems of the statistical approach are introduced on the basis of purely deterministic models. This suggests that the apparent ignoring of probabilistic models does not at all narrow the universality of algorithmic AI, the construction of which makes it possible to do without the concept of probability.

However, this concept is very productive and allows you to avoid some risky steps. For example, in [Hutter, 2005] one can meet the unreasonable assumption that the environment model has a finite complexity. This makes the environment in the exact sense of deterministic. But at the same time, we have no guarantees that with an unlimited increase in the sensory history of x _{1: k} , q, , . , «» , , . «» , , , «» , , q . , q* , ( , q* ), .

, (., ., [Solomonoff, 1997], [Li and Vitanyi, 1997], [, 2007] ). , .

, (6) , ( ) U ( ), . U(q) q(Λ). ξ(q).

The traditional answer to this problem is that any universal machine U can be emulated on another machine V using some program u, and for any program q, V (uq) = U (q). Consequently,

similarly, P _v (x) ≤ 2 ^{l (v)} P _U (x), where P _V , P _U are the algorithmic probabilities determined by the respective machines. That is, the prior probabilities set by them will differ by no more than a constant factor. In this regard, it is said that with a sufficient amount of input data, the dependence of induction and prediction on the choice of the reference machine disappears. This conclusion cannot be made for a non-universal machine V, since for it there will be q such that P_V (q)=0, . , , , : ( ), ( ) ( ). , , . , , , , ( ξ(q)=const, ξ(q) ).

, « » , . . , , , , , «». , . , , - , .

AIξ

AIξ [Hutter, 2005] (4) (5) ξ(q) μ(q).

ξ _k (q) , μ _k (q).

. p , x _>k , q, x (7). AIξ , . ( ).

, . , AIξ , (4) (5): ξ _k (q). . , , . , , , , , , . .

Once again we will answer the question: why do we consider such models a fundamentally necessary starting point? The answer lies in the fact that the “practical” approach convincingly proved its inconsistency in terms of building a strong AI. The universality property is immediately lost in it, and the “post factum” cannot be introduced, without which the AI will be fundamentally limited. This disadvantage is just eliminated in the models of universal algorithmic intelligence. However, the overestimation of the importance of the simplest of such models should also be cautioned, since they are just as ineffective as systems of weak AI are non-universal.

, AIξ . , ( ) .

Do IMI models (AIξ type models) really describe universal intelligence? We have already seen that the statement of the problem for AI as maximization of the objective function is incomplete: the very problem of setting the objective function turns out to be very difficult (here you can include the more specific problem of setting the prediction range). But is THEM universal at least in terms of solving this problem?

There are various subtle aspects. For example, algorithmic probability is not computable due to the problem of stopping. We have already noted that the result generated by the non-stop model can be interpreted in the sense of computability in the limit. Depending on the concept of computability used, non-stop models (and from a practical point of view, just long-running models) can be taken into account or ignored when calculating algorithmic probability. This aspect in the future will require clarification.

Another discussed aspect of the IMI model is determinism. Consider an elementary game - rock-paper-scissors (CBN). Each player has a fixed set of actions, and there are unambiguous rules for winning. Consider it in the case of a finite sequence of rounds.

We can build a variant tree (action / response) of a given depth for the case o _k = q '(y _<k ), y _k = p' (x _<k ); r _{k is} defined by o _k and y _k . When k = 1, the tree will be symmetric, that is, there will be no preferences between actions. However, the deterministic algorithm in this case will choose some action uniquely, after which the symmetry is broken. According to the history of the rounds, each opponent will try to restore the program of another opponent. For simplicity, we assume that the opponents' programs of each other coincide and are known to them a priori. These can be programs, for example, in the form (2). But if the programs are deterministic, then they define an unambiguous sequence of actions, and, therefore, each opponent can easily calculate the actions of the other and make a winning choice. Here we see a clear contradiction. After all, both opponents cannot win!

Indeed, there is a contradiction. It is very simple, but very deep. The fact is that when calculating y _{k, the} program p starts the program q, which is obvious if the program p is described by equation (3). But in this case, the q program has the same form and starts the p program, which, in turn, starts the q program again, etc. It would seem that the program (3) includes a finite number of calls to the program q. From the environment, we implicitly expect that it will give some directly calculated reaction to the action of the agent. However, as soon as another agent is considered as a medium, the program of which includes the launch of the program of the first agent, infinite recursion is obtained, and in principle no choice of action is made. And even an infinite amount of computing resources here will not save. Hence, interaction with other intelligent agents is a fundamental unaccounted aspect.

Of course, one might think that this paradox can be eliminated if the requirement for AI “ideality” is reduced, that is, to consider more detailed models with limited resources. It can be said that with limited resources, some choice will inevitably be made, but then a deterministic agent with fewer resources will lose to the NSC agent with superior resources. At the same time, there is a banal strategy in the KNB, which provides an average draw without any computational costs against an adversary with arbitrarily large resources: this is the choice of a truly random (but not pseudo-random) action, which is completely absent in AIξ.

As noted, the presence of true chance in the world is controversial, but in practice it will not be difficult to implement a sensor that allows an agent to generate a sequence of numbers of increasing algorithmic complexity. It is also worth noting that the example of the KNB is not at all contrived. In the interaction of agents such as predator and prey, in the process of chasing, the randomness of the choice of trajectory is just as important.

So, it can be assumed that the agent should be able to make a random choice, which in principle cannot be replaced by a deterministic algorithm (as can be seen from the example of the KNB game considered in particular, when interacting with other agents in general). If the multi-agent environment in which the AI is located is characterized by a stochastic algorithm, then this does not hurt to take into account when building its models, which inevitably should include abstraction (the construction of fundamentally incomplete / inaccurate models). Such an abstraction is necessary not only when resources are limited, but also in the case of unlimited resources to eliminate this contradiction, which occurs when p and q use exact models of each other (it is clear from constructivist considerations that the agent cannot accurately model the world, including the agent itself, since the agent cannot be more complicated than himself).

Here the natural question also arises, will they have a reflection, self-awareness? You can ask a more objectivist question: can they use correctly such concepts as “think,” “know”? On the one hand, the need for reflection in an explicit form is absent in the formulation of the problem for THEM, and it can be assumed that it will necessarily have to be introduced when considering resource constraints. On the other hand, if one assumes the presence of unlimited resources in THEM, this will not relieve him of the need to apply concepts related to his own thinking in a specific social environment. Since IMI can construct any algorithm and choose any action that increases the objective function, it can be assumed that it can indirectly use these concepts (if there is a sufficiently long interaction history), but it will not have a direct semantic basis for them.

The sufficiency of algorithmic models of the world as a whole can be questioned. It is well known that such cognitive functions, such as understanding or self-awareness, are associated with noncomputable physical processes. It is possible to refute this opinion, perhaps, by creating a strong AI (and someone will not agree to ascribe to him self-awareness or understanding even in this case), however, the argument in his favor is very controversial. Since this issue is widely discussed, here we will not pay attention to it, but only note that even if this opinion is true, the creation of an "algorithmically universal" AI will be a significant step forward, and, moreover, now there are no technical means that could embody noncomputable processes useful for the intellect.

findings

Ideal minimum intelligence models are theoretically optimal (with certain reservations) the choice of actions to maximize the objective function on the history of interaction with the environment using universal prediction based on algorithmic probability in a priori uncertain environment. Assignment of the objective function, classical computability of algorithms, underestimation of multi-agent interactions, determinism in the choice of actions, use of accurate environmental models, underestimation of the possibility of reducing uncertainty in world models when choosing actions, lack of reflection are those features of the considered IMI model that question its universality. However, these aspects look less serious than the problem of the inefficiency (or even incomputability) of this model. In addition, they can be analyzed in more detail after the input of resource constraints.

Literature

(Hutter, 2005) Hutter M. Universal Artificial Intelligence. Sequential Decisions Based on Algorithmic Probability / Springer. 2005. 278 p.

(Hutter, 2007) Hutter M. Universal Algorithmic Intelligence: A Mathematical Top → Down Approach // in Artificial General Intelligence. Cognitive Technologies, B. Goertzel and C. Pennachin (Eds.). Springer. 2007. p. 227–290.

(Li and Vitanyi, 1997) Li M., Vitanyi P. Complexity and Its Applications. 2nd ed: NY, Springer-Verlag. 1997

(Orseau and Ring, 2011) Orseau L, Ring M. Self-Modification and Artificial Agents // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th International Conference). Springer, 2011. P. 1–10.

(Ring and Orseau, 2011) Ring M., Orseau L. Delusion, Survival, and Intelligent Agents // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th International Conference). Springer, 2011. P. 11–20.

(Schmidhuber, 2003) Schmidhuber J. The New AI: General & sound & relevant for physics. Technical Report TR IDSIA-04-03, Version 1.0, cs.AI/0302012 v1, IDSIA. 2003

(Solomonoff, 1986) Solomonoff R. The Application of Algorithmic Probability in Artificial Intelligence // In: LN Kanal and JF Lemmer (Eds.). Uncertainty in Artificial Intelligence: Elsevier Science Publishers. 1986. P. 473-491.

(Solomonoff, 1997) Solomonoff R. Does Algorithmic Probability Solve the Problem of Induction? // Oxbridge Research, POB 391887, Cambridge, Mass. 02139. 1997.

(Potapov, 2007) Potapov A.S. Pattern recognition and machine perception: a general approach based on the principle of minimum description length. St. Petersburg: Polytechnic, 2007. 548 p.

Source: https://habr.com/ru/post/150056/

All Articles