Why self-learning artificial intelligence has problems with the real world
The newest AI systems start learning without knowing anything about the game, and grow to the world level in a few hours. But researchers can hardly cope with the use of such systems outside the game world.
Until recently, machines capable of confounding human champions would at least have respect for using human experience to learn games.
To win Garry Kasparov chess in 1997, IBM engineers used centuries of chess wisdom to create their computer Deep Blue. In 2016, the Google DeepMind AlphaGo program defeated champion Lee Sedol in an ancient go board game, processing millions of game positions collected from tens of thousands of games between people. ')
But now AI researchers are rethinking how their bots should absorb human knowledge. The current trend can be described as “yes and god with it”.
Last October, the DeepMind team published the details of the go go system, AlphaGo Zero, which didn’t study people at all. She started with the rules of the game and played with herself. The first moves were completely random. After each game she accepted new knowledge about what led to victory and what did not. After these matches, AlphaGo Zero pitched against the already superhuman version of AlphaGo, which defeated Lee Sedol. The first won the second with a score of 100: 0. Lee Cedol, the 18-time world champion in go, the match against AlphaGo in 2016.
The team continued the research and created the next genius player in the AlphaGo family, this time simply called AlphaZero. In a paper published on arxiv.org in December, DeepMind researchers uncovered how AlphaZero, starting from scratch, trained and defeated AlphaGo Zero — that is, it defeated the bot, the winning bot, and the winning best player in the world. And when she was given the rules for Japanese chess shogi , AlphaZero quickly learned and managed to beat the best of their specially created algorithms for this game. Experts were amazed at the aggressive and unfamiliar style of the program. “It was always interesting to me what it would be like if our superior beings flew to Earth and showed us how they play chess,” said Danish grandmaster Peter Heine Nielsen in an interview with the BBC. - Now I know".
Last year, we also saw other bots from other worlds that showed themselves in areas as diverse as unlimited poker and Dota 2, a popular online game in which fantasy heroes fight for control of another world.
Naturally, the ambitions of companies investing money in such systems extend beyond the dominance of gaming championships. Research teams like DeepMind hope to apply similar methods to real-world tasks — creating superconductors operating at room temperature, or understanding how origami will turn proteins into molecules useful for drugs. And, of course, many practitioners hope to build a general-purpose artificial intelligence — a poorly defined, but fascinating goal is to give the machine the opportunity to think like a person and be flexible in solving various problems.
However, despite all the investments, it is not yet clear how far current technologies can go beyond the limits of the game board. “I'm not sure that the ideas behind AlphaZero will be so easy to summarize,” says Pedro Domingos, a computer scientist at the University of Washington. "Games are a very, very unusual topic."
Ideal goals for an imperfect world
One common feature of many games, including chess and go - players, is constantly visible all the chips on both sides of the board. Each player has what is called "ideal information" about the state of the game. No matter how difficult the game is, you just need to think about the current position.
Many real world situations don't compare with this. Imagine asking a computer to make a diagnosis or conduct business negotiations. “Most of the strategic interactions in the real world are related to hidden information,” says Noam Brown , a computer science graduate student at Carnegie Malone University. "It seems to me that most of the AI ​​community ignores this fact."
Brown, on which Brown specializes, offers a different challenge. You do not see the opponent's cards. But here, too, machines learning through the game with themselves are already reaching superhuman heights. In January 2017, the Libratus program, created by Brown and his curator Thomas Sandholm , beat four professional players in no-limit Texas Hold'em , winning $ 1.7 million at the end of the 20-day championship.
An even more discouraging game with imperfect information - StarCraft II, another multiplayer online game with a huge number of fans. Players choose a team, build an army and wage war on a sci-fi landscape. But the landscape is surrounded by the fog of war, because of which players see only those parts of the territory in which their own troops or buildings are located. Even in the decision to explore the territory of the opponent is full of uncertainty.
This is the only game in which the AI ​​can not yet win. Obstacles are a huge number of options for moves in the game, which usually exceeds a thousand, and the speed of decision making. Each player - a person or a car - has to worry about a huge number of likely development scenarios with each click of the mouse.
So far, the AI ​​cannot compete on an equal footing with people in this area. But this is the goal for the development of AI. In August 2017, DeepMind cooperated with Blizzard Entertainment, the company that created StarCraft II, to create tools that they said would open this game for AI researchers.
Despite all the complexity, the goal of StarCraft II is simple: to destroy the enemy. This makes it related to chess, go, poker, Dota 2 and almost any other game. In games you can win.
From the point of view of the algorithm, tasks should have a “target function”, a goal to be pursued. When AlphaZero played chess, it was easy. The loss was estimated at -1, a draw at 0, a victory at +1. The objective function of AlphaZero is to maximize points. The objective function of a poker bot is just as simple: win a lot of money.
Computer walkers can train complex behavior, like walking in unfamiliar terrain.
Situations in real life are not so simple. For example, the robot requires a more subtle formation of the objective function — something like a careful selection of words when describing your desire for a genie. For example: quickly deliver a passenger to the correct address, submitting to all laws and weighing the cost of human life in dangerous and uncertain situations accordingly. Domingos says that the formation of a target function by researchers is “one of those things that distinguishes a great researcher in the field of machine learning from the average one”.
Consider Tay, the chat bot for Twitter, which Microsoft released on March 23, 2016. His goal was to engage people in conversation, which he did. “What, unfortunately, Tay found,” said Domingos, “is that the best way to maximize people's involvement would be to give racist insults.” He was turned off just a day after starting work.
Your own main enemy
Some things do not change. The strategies used today by the prevailing game bots were invented many decades ago. “This is such a blast from the past - they just give it more computing power,” says David Duveno , an informatics specialist at Tokyo University.
Strategies are often based on reinforced learning, technology with freedom of action. Instead of engaging in micromanagement, tuning the smallest details of the algorithm’s work, engineers let the machine explore the environment to learn how to achieve goals on its own, through trial and error. Before the release of AlphaGo and his heirs, the DeepMind team achieved the first big success that hit the headlines in 2013 when it used reinforcement training to create a bot that learned to play seven Atari 2600 games, and in three of them - at the expert level.
This progress continued. On February 5, DeepMind released IMPALA , an AI system that can learn 57 games with the Atari 2600 and another 30 levels made by DeepMind in three dimensions. On them, the player acts in various environments and achieves goals such as opening doors or collecting mushrooms. IMPALA seemed to transfer knowledge between tasks — the time spent on one game improved the results in the others.
But in the wider category of training with reinforcements, board and multiplayer games, you can use a more specific approach. Their study can go in the form of a game with itself, when the algorithm reaches strategic superiority, repeatedly competing with a close copy of itself.
This idea has been around for decades. In the 1950s, IBM engineer Arthur Samuel created a program for the game of checkers, which partially learned to play, competing with itself. In the 1990s, Gerald Tesauro of IBM created a backgammon program that opposed the algorithm to itself. The program has reached the level of expert people, at the same time inventing unusual but effective strategies of the game.
In an ever-increasing number of games, algorithms for playing with themselves are provided with an equal opponent. This means that a change in the strategy of a game leads to a different result, due to which the algorithm receives instant feedback. “Every time you learn something when you open up some little thing, your opponent immediately starts using it against you,” says Ilya Suckever , research director at OpenAI, a non-profit organization that he founded with Ilon Mask, dedicated to the development and dissemination of AI-technologies and the direction of their development in a safe direction. In August 2017, the organization released a bot for Dota 2, which managed one of the characters of the game, Shadow Fiend, a necromantic demon, who won the best players in the world in one-on-one battles. Another OpenAI project confronts simulations of people in a sumo match, as a result of which they learn how to fight and tricks. While playing with oneself, “there is no time to rest, you need to constantly improve,” said Suckever.
Openai
But the old idea of ​​playing with oneself is only one ingredient in the bots that prevail today, they still need a way to turn the gaming experience into a deeper understanding of the subject. In chess, go, Dota 2 video games, permutations are larger than atoms in the universe. Even if we wait for several human lives, while the AI ​​will struggle with its shadow in the virtual arenas, the machine will not be able to realize each scenario, write it in a special table and access it when this situation happens again.
To stay afloat in this sea of ​​possibilities, “it is necessary to generalize and highlight the essence,” says Peter Abbil , an informatics specialist at the University of California at Berkeley. IBM's Deep Blue did this with a built-in chess formula. Armed with the ability to evaluate the power of game positions that she had not yet seen, the program was able to apply moves and strategies that increase its chances of winning. In recent years, new technology makes it possible to completely abandon this formula. “Now, all of a sudden, all this encompasses the 'deep network',” said Abbil.
Deep neural networks, whose popularity has soared in recent years, are built from layers of artificial "neurons" layered on top of each other , like a stack of pancakes. When a neuron in one of the layers is activated, it sends signals to a higher level, and there they are sent even higher, and so on.
By adjusting the connections between the levels, these networks surprisingly cope with the transformation of input data into the associated output, even if the connection between them seems abstract. Give them a phrase in English, and they will be able to practice translating it into Turkish. Give them images of animal shelters, and they can determine which one is for cats. Show them the game poly, and they will be able to understand the probability of winning. But usually such networks first need to provide lists of marked examples on which they can practice.
That is why the game with yourself and the deep neural networks are so well combined with each other. Independent games give out a huge number of scenarios, and the depth network has a virtually unlimited amount of training data. And then the neural network offers a way to learn the experiences and patterns encountered during the game.
But there is a catch. For such systems to provide useful data, they need a realistic platform for games.
"All of these games, all of these results, were achieved under conditions that allowed the world to simulate perfectly," said Chelsea Finn, a Berkeley graduate student who uses AI to manipulate robotic hands and interpret data from sensors. Other areas are not easy to imitate.
Robomobils, for example, hardly cope with bad weather or cyclists. Or they may not perceive the unusual possibilities found in the real world - such as a bird flying directly into the camera. In the case of robotic hands, as Finn says, the initial simulations gave basic physics, which allowed the arm to learn how to learn. But they do not cope with the details of touching different surfaces, so tasks such as tightening the bottle cap - or carrying out complex surgery - require experience in reality.
In the case of problems that are difficult to simulate, playing with yourself will no longer be so helpful. “There is a big difference between a truly ideal model of the environment, and an exemplary model that has been learned, especially when reality is really complicated,” Joshua Bengio , a pioneer in-depth learning from the University of Montreal, wrote to me. But AI researchers still have ways to move on.
Life after games
It is difficult to accurately indicate the beginning of the superiority of AI in games. You can choose to lose Kasparov in chess, defeat Lee Sedol from AlphaGo's virtual hands. Another popular option will be the day of 2011, when the legendary champion of the game Jeopardy! Ken Jennings lost to IBM Watson. Watson was able to handle clues and wordplay. “I welcome the advent of our new computer masters,” wrote Jennings, under his last answer.
It seemed that Watson has office skills similar to what people use to solve many real-world problems. He could perceive input in English, process related documents in a blink of an eye, extract coherent pieces of information and choose the one best answer. But after seven years, reality continues to put complex obstacles in front of AI. Stat's health report in September indicated that the heir to Watson, specializing in cancer research and developing personalized recommendations for treating Watson for Oncology, ran into problems.
“Questions in the game Jeopardy! It is easier to handle, because common sense is not required for this, ”wrote Bengio, who worked with the Watson team, in response to a request to compare these two cases from an AI point of view. “To understand a medical article is much more difficult. A lot of basic research is required. ”
But let the games and narrowly specialized, they resemble several real problems. DeepMind researchers did not want to answer interview questions, indicating that their work on the AlphaZero is currently being studied by independent experts. But the team suggested that such a technology would soon be able to help biomedicine researchers who wish to understand protein coagulation.
To do this, they need to understand how the various amino acids that make up a protein bend and fold into a small three-dimensional machine, the functionality of which depends on its shape. This complexity is similar to the complexity of chess: the chemists know the laws at such a level that it is rough enough to cheat certain scenarios, but there are so many possible configurations that it will not work to search all possible options. But what if protein folding can be thought of as a game? And this has already been undertaken. Since 2008, hundreds of thousands of people have tried the online game Foldit , in which users are awarded points for the stability and reality of the collapsed protein structure. The machine could train in a similar way, perhaps trying to surpass its previous best achievement with the help of reinforcement training.
Learning with reinforcements and playing with yourself can help train and interactive systems, Satskever suggests. This can give the robots, who have to talk to people, a chance to practice in this while talking to themselves. Considering that specialized equipment for operating AI is becoming faster and more accessible, engineers have more and more incentives to design tasks in the form of games. “I think that in the future the importance of playing with oneself and other ways of consuming a large amount of computing power will increase,” said Sackever.
But if the final goal of machines is to repeat everything that a person is capable of, then even a generalized champion in the game of board games like AlphaZero still has room to grow. “I need to pay attention, at least to me, this is obvious, to the huge gap between real thinking, creative research of ideas and today's abilities of AI,” says John Tenenbaum , a cognitive scientist from MTI. “Such intelligence exists, but so far only in the minds of great AI researchers.”
Many other researchers, sensing the hype around their field, offer their own criteria. “I would recommend not to overestimate the importance of these games, for AI or for general purpose tasks. People are not very good at playing the game, ”said Francois Cholet, a Google depth study researcher. “But keep in mind that even very simple and specialized tools can accomplish much.”