📜 ⬆️ ⬇️

AlphaGo Zero's AI platform honed the skill of playing go without human intervention



DeepMind, a division of the holding Alphabet, continues to work on the improvement of artificial intelligence. It was DeepMind experts who created the go world champion, the AlphaGo platform. She managed to beat several world champions in go, after which it became clear that a person would never be able to beat a car.

DeepMind recently announced the emergence of an even more powerful computer go system capable of playing go better than all previous versions of AlphaGo. Novelty called AlphaGo Zero. This platform has learned to play go without learning the games played by the person, by itself.
')
In the "knowledge base" AlphaGo Zero - the rules of go and nothing else. However, the program is improving very quickly, playing with itself. The developers claim that Zero has mastered the rules of the game in just a few hours. After three days of self-study, AlphaGo Zero defeated AlphaGo Lee, the AI ​​version that defeated Lee Sedol with a 4: 1 score in 2016.

After 21 days, the system played at the level of AlphaGo Master - the platform version, which this year beat the best players in the world in go from the list of the top 60, including world champion Ke Jie in all three games.

After 40 days of training in games against herself, Zero easily coped with all her ancestors. The AlphaGo Master defeated the system that won against Lee Sedol with a score of 100: 0. As you learn, the system created a “tree” of possible moves, assessing the consequences of each.

The developers have given the new system only basic information about the rules of the game. The base is not laid information about the games of champions. The system has learned everything by itself, playing with its copy millions of times. On one move it took about 0.4 seconds. If a person wanted to go through the same number of games, then he would need several thousand years. After each new batch of weight in the neural network and other components are updated. Interestingly, AlphaGo Zero has only one layer of a neural network, and not two, as in previous versions.

The creators of the system claim that in this case you should not be afraid of the power of AI. The specialists who created this system claim that the style of its go game is similar to the style of some masters, but this is only at the very beginning. When the battle reaches about the middle, experts usually do not see any particular strategy - it seems that the system acts randomly. But in reality this is not so - all moves are carefully planned and aimed at winning.

For the first time, Google told about AlphaGo in 2015. The system works using two neural networks. The first one calculated the possibility of making certain moves, the second one estimated the position of the stone on the board during the game. Initially, the system was trained on the example of player-person batches. In addition to the neural networks in AlphaGo, there was still the same search for a probability tree using the Monte Carlo method — a technology often found in good computer go systems. In this case, the machine selects the optimal course, analyzing various moves. Over time, the developers of AlphaGo added all the new features, using reinforcement training. In this case, the system is trained without the use of a training sample of parties.


Seven-time European champion Alexander Dinerstein (3 professional dan, 7 dan EGF) shared his opinion on the new system with us.

The car went completely on its own. Previous versions of AlphaGo to assimilate the rules first drove a set of parties of players-people and only then played against copies of themselves to hone the game. The AlphaGo Zero version played only with itself and learned everything on its own, but even the AlphaGo Master, who played against Ke Jie in May, won. Do you agree with the fact that when examining AlphaGo Zero, researchers do not even stutter about the match with a person and represent only a different computer system as a reference for comparison?

It seemed to me, Zero began to play a more humane way, the moves became easier to understand, the game is less than what we call tenuki - this is when the program drastically changes its plans, essentially not responding to the opponent's last move. Of the minuses: the program still repeats the same schemes in the debuts, which makes the party less spectacular. Go in these games even resembles chess with their long debuts studied. But in fact, in the batches of people, often after the first 5-10 moves, a position that has not previously been encountered arises - it is much more interesting to sort these batches.

I expected that they would show us the game on the odds - after all, there were allegations that the latest version of alpha could give 4 odds for the one that played with Fan (European champion). Alas, these parties are still kept secret.

About new matches can not hear anything. Yes, and those wishing among the pros somehow not visible. Understand, apparently, that when playing on equal chances there is no, but playing handicap is a blow to your ego.

In their work, the developers notice how AlphaGo Zero gradually invented some joseki (opening combinations), including one combination that is found in a professional game. In the same place, researchers note that the algorithm manifests some properties characteristic of a human game: seizure of territory, greed, zones of influence. Do you think it is correct to call the computer system go a weak form of artificial intelligence?

On the novelties in the debuts: as in the previous batches of alpha-li and alpha-master, we encounter moves that people considered bad. I have been teaching him for 15 years and remember that I scolded my students for such moves. Now all go professionals are trying to copy them, even the proud Japanese, who rarely adopted the Chinese and Korean innovations. Everyone agrees that Alpha’s ideas are powerful, no one even tries to disprove it.

How did AlphaGo change go philosophy? Have new strategies already appeared? How can the completely “non-human” AlphaGo Zero change the world of go?

The ideas of AlphaGo made the game more boring in the openings. And this is good. People will continue to be interested in the parties of professionals, to monitor their innovations. Nowadays, there are no programs for sale by virtue of the open source, and there are no programs that are played by pro. We expect the Japanese DeepZenGo 7 in November of this year. She will play in the power of top pro (and this is confirmed, as it is actively being tested on go-servers). Here the first problems will begin. We will feel in the shoes of chess players with their eternal suspicions of dishonest play. And tournaments on go servers will suffer. But it is inevitable. Although no one imagined that this would happen so quickly.

Has the fact that from now on a person — a computer will have to give an algorithm, not a algorithm, to a protein player?

The question of handicap is very difficult. The program shows that it is stronger than the best protein masters, but how much? Lee Sedol, for example, is sure that on 2 stones the odds of the match will not lose. It would be interesting to hold a match on a floating handicap - in the format that Go Seigen applied in the middle of the last century. But who of the top pros will do this? For two, the odds of the pros from the pros won earlier - let us recall, for example, the match Cho Hongheng with the top five contenders for Korean titles in the 80s. In my memory it was the last match of this kind. What if you need not two stones, but 3 or 4? Can you imagine Kasparov playing a match with a machine without a rook? Me not!

Curious question. One of the alpha programmers had previously worked on the giraffe self-learning chess program, which learned how to play by virtue of a master in 72 hours. He probably gained a lot of experience working on the program. It is curious whether he can write a new chess program by analogy with Alpha? Or the approach with neural networks does not work in chess? I am very interested in the answer to this question.

Source: https://habr.com/ru/post/373919/


All Articles