I
In a recent
article about the risks associated with AI, one of the commentators asked me to briefly formulate arguments proving the seriousness of the situation. I wrote something like this:
1. If humanity does not self-destruct, then eventually we will create an AI of the human level.
2. If humanity can create a human-level AI, then progress will continue, and in the end we will come to an AI level far above the human level.
3. If such an AI appears, it will eventually become so much stronger than humanity that our existence will depend on whether its goals coincide with ours.
4. It is already possible to conduct useful research that will increase our chances of successfully solving the problem of the coincidence of our goals.
5. Since we can already begin these studies, we probably should do so, since it will be sufficiently short-sighted to leave this problem until it becomes too obvious and urgent.
In the first three points, I am sure more than 95% - this is just a discussion on the topic that if the trends of today's movement towards a certain goal continue, then we will come to it as a result. In the last two statements, I'm sure less, about 50%.
Commentators generally agreed with these statements. No one seriously tried to argue with pp 1-3, but many have argued that there is no point in worrying about AI now. As a result, we have an expanded analogy with the illegal hacking of computers. This is a big problem that we could never solve completely - but if Alan Turing wanted to solve this problem in 1945, his ideas could be like “keep punch cards in a closed box so that German spies cannot read them”. Will the attempt to solve problems related to AI, in 2015, end up with about the same nonsense?
')
Maybe. But for several reasons, I will allow myself to disagree with this. Some of them are fairly general, so to speak, meta-levels, some are more concrete and objective. The most important reason for the meta-level is the following: if you take paragraphs 1-3, that is, the possibility of extinction of humanity if we are unable to solve the problem of coinciding our goals with the goals of AI, then you really think that our chances of progress in solving this problems are small? So small that we can say: “Yes, of course, we are moving towards self-destruction, but does research on whether we can do something about it will be an effective waste of resources”? And what are the other, amazing options for using resources that you prefer? You can, of course, make arguments in the style of "
Pascal's bet, " but keep in mind that some
professional boxer gets
many times more for the fight than we spent on studying the risks associated with AI in the entire history of mankind!
If the
AI restriction attracted at least a tenth of that attention or a hundredth of the money that attracted
boxing matches with AI , the world would be much calmer [wordplay: AI boxing - the conclusion of the AI ​​in its limiting environment; but so can be called and fictional boxing matches with AI-robots - approx. trans.].
But I would like to make an even stronger statement: the risks associated with AI are not just more important than boxing matches; this is just as important as all other things considered important, such as finding a cure for diseases, finding dangerous asteroids, and preserving the environment. Therefore, it is necessary to prove that progress in this matter can be achieved even at such an early stage of development of this area.
And I believe that progress is possible, because this problem lies in the field of philosophy, not technology. Now our goal is not to “write code that controls the future of AI,” but “to understand what category of tasks we will have to face.” Let me give you a few examples of open problems in order to smoothly move on to discussions about their current relevance.
II
Problem 1: Electrodes and the Brain
Electrodes are implanted into the brain of some people - this is done for both therapeutic and research purposes. If the electrode gets into certain parts of the brain, for example, in the lateral part of the
hypothalamus , the person gets
an irresistible desire to stimulate them as much as possible. If you give him a button for stimulation, he will press it a thousand times an hour. If you try to take this button from him, he will desperately and savagely protect it. Their life and goals are compressed to the point, normal goals such as love, money, fame, friendship - are forgotten, and all because of the desire for maximum stimulation of the electrode.
This is a good match to what we know about neuroscience. Rewards in the brain (ATTENTION: EXTREME SIMPLIFIED) are given through the electrical voltage that occurs in a couple of reward centers, and therefore the brain strives for everything that maximizes the rewards. It usually works well: after satisfying a biological need, such as food or sex, the reward center responds to this by reinforcing reflexes, and therefore you continue to meet your biological needs. But direct stimulation of the centers of reward with the help of electrodes works much more than the simple expectation of small awards received in a natural way, therefore such an activity becomes by default the most rewarding. A person who has received the possibility of direct stimulation of the center of remuneration will forget about all these indirect ways of receiving rewards like “happy life”, and will simply press the button connected to the electrode as much as possible.
And for this, neurosurgery is not even needed - drugs such as cocaine and methamphetamine are addictive, in particular, because they interfere with the workings of brain biochemistry and increase the level of stimulation of reward centers.
Computers can face a similar problem. I can not find the link, but I remember the story about the evolutionary algorithm developed to create code in some application. He generated the code half by chance, then chased it through the “compatibility function”, which determined how useful it was, and the best parts of the code were crossed with each other, slightly mutated, until an adequate result was obtained.
As a result, of course, we got a code that hacked the compatibility function, as a result of which it produced some absurdly high value.
These are not isolated cases. Any thinking that works with
reinforcement learning and the reward function — and this seems to be a universal scheme, both in the biological world and in a growing number of AI examples — will have a similar flaw. The main defense against this problem, at the moment - lack of opportunity. Most computer programs are not smart enough to "crack the function of getting a reward." And the people of the reward system hidden in the head, where we can not get to them. A hypothetical supermind will not have such a problem: it will know exactly where its center of rewards is, and it will be smart enough to get to it and reprogram it.
In the end, if we do not take deliberate actions to prevent it, it turns out that the AI ​​developed for treating cancer will crack its own module that determines how much cancer it has cured and will give it the maximum value possible. And then go in search of ways to increase memory, so that it can be stored even greater value. If it is supramental, then it will be possible to include in the options for expanding memory “gaining control over all the computers in the world” and “turning everything that is not a computer into a computer”.
This is not some kind of exotic trap into which several strange algorithms can fall; This can be a natural development for a reasonably intelligent learning system with reinforcement.
Problem 2: strange decision making theory
Pascal's wager is a famous argument on why it is logical to join a religion. Even if you think that the probability of the existence of God is vanishingly small, the consequences of your mistake (going to hell) are great, and the benefits if you are right (you can not go to church on Sundays) are relatively small - so it seems advantageous to just believe in God, just in case. Although quite a lot of objections were invented to such reasoning based on the canons of specific religions (does God want to be believed in on the basis of such an analysis?), This problem can be
generalized to the case when it is beneficial for a person to become an adherent of anything They promised him a huge reward. If the reward is large enough, it overpowers the person’s doubts about your ability to provide this reward.
This problem in the theory of solutions is not related to issues of intelligence. A very smart person will probably be able to calculate the probability of the existence of God, and numerically estimate the disadvantages of hell - but without a good theory of decision making, no intellect will save you from Pascal's betting. It is intelligence that allows you to carry out formal mathematical calculations, convincing you of the need for betting.
People easily resist such problems — most people won't bet on Pascal, even if they don’t find flaws in it. However, it is not clear why we have such resilience. Computers that are notorious for relying on formal mathematics, but do not have common sense, will not acquire such resilience if they do not invest it in them. And to invest them in it is a difficult task. Most of the loopholes that reject Pascal’s bet without a deep understanding of what the use of formal mathematics leads to, simply generate new paradoxes. A decision based on a good understanding of at what point formal mathematics stops working, while preserving the usefulness of mathematics in solving everyday problems, as far as I know, has not yet been worked out. What is worse, having decided to bet Pascal, we will encounter a couple of dozens of similar paradoxes of the theory of solutions that may require completely different solutions.
This is not just a cunning philosophical trick. A sufficiently good "hacker" can overthrow an all-galactic AI, simply threatening (without proof) with incredible damage if the AI ​​does not fulfill its requirements. If the AI ​​is not protected from similar “Pascal's wagers” paradoxes, he decides to fulfill the requirements of the hacker.
Problem 3: the effect of evil genius
Everyone knows that the problem with computers is that they do what you tell them, not what you mean. Today it only means that the program will work differently when you forget to close the bracket, or websites will look weird if you mix up HTML tags. But this may lead the AI ​​to misunderstand the orders given in natural language.
This is well illustrated in the history of the
Age of Ultron . Tony Stark
orders the Ultron supercomputer to build peace in the world. Ultron calculates that the fastest and most reliable way to do this is to destroy a lifetime. Ultron, in my opinion, is 100% right, and in reality everything would have happened. We could get the same effect by setting AI tasks like “cure cancer” or “end hunger”, or any of thousands of similar ones.
The user expresses confidence that a meteor colliding with Earth will lead to the end of the feminist debate.Even Isaac Asimov’s “Three Laws of Robotics” will be enough for 30 seconds to turn into something disgusting. The first law says that a robot cannot harm a person, or by its inaction lead to a person getting harm. “Do not overthrow the government” is an example of how people can get hurt through inaction. Or "do not lock every person in the stasis field forever."
It is impossible to formulate a sufficiently detailed order explaining what exactly is meant by “not allowing your inaction to receive harm by man”, unless the robot itself is capable of doing what we mean, and not what we are saying. This, of course, is not an insoluble problem, since a smart enough AI can understand what we mean, but our desire to have such an understanding needs to be programmed into the AI ​​directly, from scratch.
But this will lead to the second problem: we do not always know what we mean. The question “how to balance ethical bans aimed at the safety of people with bans aimed at preserving freedoms?” Is now hotly debated in political circles, and appears everywhere, from control over the circulation of weapons to the ban on large-scale sugary drinks. Apparently, the balance of what matters to us and the combination of economy and sacred principles matter here. Any AI who is not able to understand this moral maze can end world hunger by killing all starving people, or reject the invention of new pesticides for fear of killing an insect.
But the more you study ethics, the more you realize that it is too complicated and resists simplification to some kind of formal system that a computer could understand.
Utilitarianism is almost amenable to algorithmization, but it is not devoid of paradoxes, and even without them you would need to assign utility to everything in the world.
This problem has yet to be solved in the case of people - the values ​​of most of them seem disgusting to us, and their compromises are losing. If we create an AI, whose mind will differ from mine no more than the mind of
Pat Robertson , I will consider this development as a failure.
III
I did not raise these problems in order to hit anyone with philosophical questions. I wanted to prove a few statements:
First, there are major problems affecting a wide range of ways of thinking, for example, “everyone who learns with reinforcement” or “everyone who makes decisions based on formal mathematics”. People often say that at this stage it is impossible to know anything about the design of future AI. But I would be very surprised if they didn’t use either reinforcement training or decision making based on formal mathematics.
Secondly, for most people, these problems are not obvious. These are strange philosophical paradoxes, and not something that everyone understands with basic knowledge.
Third, these problems have already been reflected. Someone, a philosopher, a mathematician, a neurobiologist, thought: “Listen, because reinforcement training is naturally subject to the problem of implantation of electrodes, which explains why the same behavior can be traced in different areas.”
Fourth, these problems indicate the need to conduct research now, even if preliminary. Why do people resist Pascal's bet so well? Is it possible to reduce our behavior in situations of high utility and low probability to a function, using which the computer would take the same decision? What are the best solutions for problems related to the theory of solutions? Why is a person able to understand the concept of implantation of electrodes, and does not seek to get such an electrode personally for his brain? Is it possible to develop a mind that, using such an electrode, will understand all the sensations, but will not feel the desire to continue? How to formalize the ethics and priorities of people enough to cram them into a computer?
It seems to me that hearing that "right now we should start working on the problem of coincidence of goals with AI", they think to themselves that someone is trying to write a program that can be directly imported into the AI ​​edition of 2075 in order to give it an artificial conscience. And then they think: "Yes, you can never do such a complicated thing so early."
But no one offers this. We propose to become familiar with the general philosophical problems that affect a wide range of ways of thinking, and to carry out the neurobiological, mathematical and philosophical studies necessary to understand them by the time the engineering tasks appear.
By analogy, we are still very far from creating spaceships moving at even half the speed of light. But we already know what problems a ship moving faster than light (the theory of relativity and the speed limit of light) may encounter and have already spawned several ideas to get around them (the
Alcubierre bubble ). We are not yet able to build such an engine. But if by the year 2100 we find out how to build ships approaching the speed of light, and for some reason the fate of the planet will depend on whether we have ships moving faster than light by 2120, it will be wonderful to know that we have done all the work with the theory of relativity in advance, and do not lose precious time to discuss the fundamentals of physics.
The question "Can we now conduct a security research on AI" is stupid, since we have already conducted a certain amount of research in this area. They led to the understanding of such problems as the three mentioned above, and others. There are even a few answers to the questions, although they are given at technical levels, much lower than any of these questions. Every step we have taken today allows us not to waste time on it in the future in a hurry.
IV
My statement remains at number five - if we can conduct research on AI today, we must conduct them, because we cannot rely on our descendants to conduct these studies in a hurry without our help, even using their improved model what AI is and how it looks. And I have three reasons for this.
Reason 1: treacherous turn
AI models of our descendants can be deceiving. What works for the intellect below or at the human level may not work for the superhuman. Empirical testing will not help without the support of theoretical philosophy.
Poor evolution. She had hundreds of millions of years to develop a defense against heroin — which affects rats in much the same way as it does on humans — but she wasn’t long. Why? Because until the last century there was no one smart enough who could synthesize pure heroin. So heroin addiction was not a problem that evolving organisms would encounter. The scheme of the brain, which shows itself well in stupid animals, such as rats or cows, becomes dangerous, hitting people who are smart enough to synthesize heroin or insert electrodes into pleasure centers.
The same applies to AI. Dog-level AIs do not learn how to crack their reward mechanism. This may not be possible and the AI ​​level of the person - I would not be able to break the mechanism of the robot's rewards, if I were given it. Superintelligence can. We can run into AI with reinforcements, who work fine at the dog level, good at the human level, and then suddenly explode, moving to the level of superintelligence - and by that time they are too late to stop.
This is a common feature of the failing security modes of AI. If you tell a simple person to study the world around the world, then the best I could do is to become the UN Secretary General and learn how to negotiate. Give me a few thousand nuclear warheads, and everything will turn out differently. A person’s AI level can pursue the goals of world peace, cancer treatment, not allowing people to take damage through inaction, in the same way people do, and then change these ways when he suddenly becomes super-intelligent and sees new opportunities. And this transition will occur at the point where people can no longer stop him. If people have the opportunity to simply disable AI, then the most effective way for him to rid mankind of cancer will be the search for drugs. If they can not turn it off, the destruction of humanity will be the most effective.
In
his book, Nick Bostrom calls this scheme a “traitorous turn,” and he condemns anyone who wants to wait until the AI ​​first appears, and then solve their moral failures through trial, error and observation. It would be better to acquire a good understanding of what is happening, then to predict these turns in advance, and develop systems to avoid them.
Reason 2: hard start
Nathan Taylor from Praxtime
writes :
Perhaps most of the current debate about the risks of AI are just variants of one, more fundamental argument: an easy start or a hard start.
Easy start is the progress of AI, going from the level below the human level to the level of a dull person, to an even more intelligent person, and then to a superhuman one, slowly, for many decades. A hard start is the rapid progress of AI, from a few days to a few months.
In theory, if you connect a human-level AI to a calculator, you can raise it to the level of a person who can count very quickly. If you connect it to Wikipedia, you can pass it all the knowledge of humanity. If you connect it to a drive of several gigabytes, you can make it the owner of a photographic memory. Writing additional processors, and speeding it up many times - so that the problem, for the solution of which the person takes a whole day, will take this AI for 15 minutes.
We have already moved from "human intelligence" to "human intelligence with all the knowledge, photographic memory, fast computing, solving problems a hundred times faster than people." And it turns out that this “intellect at the human level” is no longer at the human level.
The next problem is recursive self-improvement. Perhaps this human-level AI with a photographic memory and a tremendous speed will learn programming. With his ability to absorb textbooks in seconds, he will become an excellent programmer. This will allow him to correct his own algorithms in order to increase his intelligence, which will allow him to see new ways to become more intelligent, etc. As a result, he will either reach a natural maximum, or become super-intelligent in an instant.
In the second case, the method of “wait until the first human-level intelligence appears, and then test it” will not work. The first human-level intellect will turn too quickly into the first superhuman-level intellect, and we will not have time to solve even one of the hundreds of problems associated with meeting our goals.
I have not yet met with such arguments, but I would say that even in the event of a hard start, we can underestimate the risks.
Imagine that for any reason in terms of evolution it would be cool to have two hundred eyes. 199 eyes do not help, they are no better than two, but if suddenly creatures with 200 eyes appear, they will forever become the dominant species.
The most difficult thing in a question with 200 eyes is to receive as a result of the evolution of the eye in principle. After that, getting 200 eyes is very easy. But whole epochs can pass before any organism reaches a state with 200 eyes. A few dozen eyes are wasting energy, so evolution can in principle not reach the point at which 200 eyes will appear.
Suppose the same works with intelligence. It is very difficult to evolve to a tiny rat brain. From now on, getting a human brain capable of dominating the world will only be a matter of scaling. But since the brain spends a lot of energy and was not so useful before the discovery of technology, its appearance took a lot of time.
Confirmations abound. First, humans evolved from chimpanzees in just a few million years. This is too little to rework the mind from scratch or even invent new evolutionary technologies. This is enough to scale and add a couple of effective changes. But monkeys existed for tens of millions of years before.
Secondly, dolphins are almost as smart as humans. But our common ancestor with them lived about 50 million years ago. Either humans and dolphins evolved 50 million years independently, or the very last of the common ancestors possessed everything necessary for the intellect, and people with dolphins are just two species of animals in a large family tree for which the use of intelligence to the fullest became useful. But this ancestor was, most likely, not smarter than the rat itself.
Thirdly, people can frighteningly quickly increase their intelligence under the pressure of evolution. According to
Cochran , IQ Ashkenazi grew by 10 points every thousand years. People suffering from
torsion dystonia can score 5-10 IQ points due to one mutation. All this suggests that the intellect is easy to change, but evolution has decided that it is not worth the development, except in some special cases.
If this is the case, then the first AI, comparable in level to the rats, will already contain all the interesting discoveries necessary for building the AI ​​level of a person and the first super-intelligent AI. Usually, people say that "Well, yes, maybe we will soon make the AI ​​of a rat level, but it will take a long time before its level equals the human one."
But this assumption is based on the fact that it is impossible to transform the intelligence of a rat into a human, simply by adding processors or more virtual neurons or their connections or something else. After all, the computer does not need to worry about the limitations associated with metabolism.Reason 3: temporary restrictions
Bostrom and Muller interviewed AI researchers about when they are expecting the appearance of an AI level person. The median of predictions is 2040 - this is after 23 years.People reflected on Pascal's bet of 345 years, and did not come up with a generalized solution to this paradox. If this is a problem for AI, we have 23 years left to solve not only this problem, but also the whole class of problems associated with AI in general. Even excluding options like an unexpected hard start or treacherous turns, and accepting the hypothesis that we can solve all problems in 23 years, this period does not seem so big.During the Dartmouth Conference on AI in 1956, the best researchers made a plan to achieve a human intellect level and appointed themselves a period of two months to train computers to understand the human language. In retrospect, this seems like a slightly optimistic outlook.But now computers have already learned to more or less tolerably translate texts, and are developing well in the area of ​​solving complex problems. But when people think of things like decision theory or implanting electrodes or leveling goals, they simply say, "Well, we have plenty of time."But to expect that these problems can be solved in just a few years, it may be also optimistic how to solve the machine translation problem in two months. Sometimes tasks turn out to be more difficult than you thought, and it’s worth starting to do them earlier, just in case.All this means that theoretical research on the risks of AI should begin today. I'm not saying that all the resources of civilization should be thrown at it, and I heard that some believe that after the grant of $ 10 million from Mask, this problem was no longer so urgent. I think that with the public awareness of this fact, there are no problems. The average person watching a movie about killer robots does more harm than good. If there is a problem, then it is that smart people from the necessary fields of knowledge - philosophy, AI, mathematics, neurobiology - can spend their time on these tasks and convince their colleagues of their seriousness.