Deephack: hackathon for deep learning with reinforcements, or how we improved the Google Deepmind algorithm

From July 19 to July 25, the Deephack hackathon took place, where participants improved the learning algorithm with reinforcements based on Google Deepmind. The goal of the hackathon is to learn how to better play the classic Atari games (Space Invaders, Breakout, etc.). We want to tell why this is important and how it was.

Authors of the article: Ivan Lobov IvanLobov , Konstantin Kiselev mrKonstantin , Georgy Ovchinnikov ovchinnikoff .
Photos of the event: Maria Molokova, Polytechnic Museum.

Why a reinforcement training hackathon is cool:

This is the first Russian hackathon using deep learning and reinforcement learning;
Google's Deepmind Algorithm is one of the latest advances in reinforcement learning;
If you are interested in artificial intelligence, then this topic is very close to this concept (although we ourselves would not like to call it AI).

Where does the interest in learning research with reinforcement come from?

To begin with, the state of machine learning is now and what can be solved with it. There are 3 main areas (intentionally simplified):

Teaching with a teacher is any task where the algorithm is trained using examples: gave the answer — got the result. This includes regressions and classifications. Tasks from the real world: assess the value of real estate, predict sales, predict earthquakes;
Teaching without a teacher is a task without “answers”, where you need to find patterns in the data, find “similarity” or “dissimilarity”. Tasks from the real world: consumer clustering, search for association rules;
Reinforcement learning is an intermediate type of task when learning occurs when interacting with an environment. The algorithm (agent) performs actions in the environment and sometimes receives feedback. Many “interesting” human activities fall under this class of tasks: sports competitions (every second of time you do not have the “right” action, there is only a result - a goal is scored or not), negotiations, the process of scientific research, etc.

So, the tasks of training with a teacher have already been solved for almost all areas. However, the next big step is the areas of machine learning, where there is no obvious and instant result of interaction. It is in this direction that the field of machine learning is moving now.
')

How the reinforcement learning algorithms work using the example of Atari games (intuition)

There is an agent and environment - the game. For an agent, the game looks like a black box - he does not know its rules and does not realize that this is a game at all. At the start of training, the game gives the agent a set of actions (actions) that he can perform in it, while all actions for the agent look identical. Then, at each step, the agent performs one of the actions and receives in response the state of the game (state) and points (reward). The game state is a picture from the screen. Points (reward) is a reward for actions performed, can be positive, negative or 0. During training, the agent chooses actions and tries them in the game, getting points. The task of the agent is to develop a strategy that maximizes the final points.

In fact, we imitate a person’s learning of the game (very rudely, since in fact we don’t know exactly how we are learning). With some differences, for example: an adult already has a lot of experience and a wide class of associative concepts has been formed, which allows him to at a glance understand the rules of the game and quickly learn. In our case, the agent's learning model is more like the learning of a 2-year-old child: first, he arbitrarily presses the control buttons and gradually, picking up the laws and principles of the game, he begins to play better and better.

For training, a model is needed by which our algorithm approximates the rules of agent interaction with the environment. One of the common techniques that DeepMind has successfully applied in Atari games is Q-learning. This technique simulates the reward function Q. Next, when testing, the agent selects actions according to the rule: the action must maximize the reward function.

As a model, there may be decision trees, multidimensional linear function, neural networks, etc. When we deal with complex multidimensional data, such as pictures, convolutional neural networks have proven themselves well. Deepmind Innovation - to combine convolutional networks and Q-learning.

What will the development of machine learning?

Now we manage to teach the computer to play simple games better than humans, which most people do not make a big impression. The next step is to teach the computer to play, say in Doom, i.e. make him learn in 3 dimensional environment. Then gradually complicate the game. The main task is to work out certain principles of finding optimal solutions for the tasks posed in complex environments and use these principles in the form of algorithms in the real world. Thus, machines playing games can get an effective representation of their environment and use it to summarize past experience in new situations.

If you manage to force the computer to learn how to play, for example, Need For Speed, and learn how to play well, then the algorithms created with small modifications can be used in learning robots to drive real cars. And not only the car ... It will allow you to come to the mass use of robots, ranging from personal assistants to smart self-service systems, smart urban environment in which cars under the supervision of a person independently serve the entire complex urban infrastructure.

Now I understand why Google acquired Deepmind for more than $ 400 million.

How did the hackathon go

The organizers prepared for the hackathon seriously: 7 full days with accommodation, competition for participation - 5 people per seat, lecturers from the top 10 researchers in the field of machine learning, 15 gpu-clusters, 24/7 support for participants on any issues. Venue - MIPT campus in Dolgoprudny.

Competition procedure:

Qualifying round (first 6 days) - who better learns to play Gopher , Seaquest and Tutakham . Our team, Rockband, took 3rd place;
Finals - the Olympic system of 8 teams in 3 unknown games in advance ( Space Invaders , Hero , Kung-Fu Master ).

What did all this work? (Soft)

Stella (Atari emulator) -> ALE (Arcade Learning Environment) -> a machine learning framework to choose from.

The basis of all decisions was taken open source Google Deepmind and article 2015 in Nature . The teams solved the problem in one of the three ML frameworks: Lua + Torch (the original code is on it), Python + Theano or C ++ / Caffe. We chose Python + Theano, since we had more experience in it. We can not clearly identify the best framework, each had its own drawbacks. In general, there is a feeling that the area is still fresh, so there is little proven and well-functioning code. Much has to be rewritten, recheck and debug. No significant advantages of one of the frameworks were found: neither in the speed of calculations (the narrow neck is still - the bundles in cuDNN), or in the convenience of prototyping.

What did all this work? (Iron)

For calculations, each team was allocated a cluster with 4 GRID K520 on AWS (g2.8xlarge), so it was possible to run up to 4 calculations simultaneously. This was enough to run a number of tests over the course of a week (a full test is trained on one GPU ~ 5 full days, however we drove short ten-hour tests), which helped to test the first hypotheses. But it is not enough to conduct a full-fledged study, so some teams continue to develop their achievements after the end of the hackathon. Although the question here is not so much in hardware as in real time.

Examples of how a model trained in 24 hours plays:

Space invaders
Kung fu master

How was it (photo)

Still fresh, discuss ideas:

The first (?) Night right in coworking:

Despite the fatigue, everyone listens to lectures:

Last days before the finals. Some days almost without sleep. Only the most persistent still work:

Final at ENEA. Everything has already been decided, it remains to root for the trained models:

Hackathon members:

Instead of conclusion

We all have very different backgrounds - advertising, IT, science. However, we are united by one thing - we see the future behind the development of machine learning and we understand that very few people in Russia do this:

Educational institutions - units with professors who publish something;
Companies that really use something - enough for two hands to count. And 3 fingers to recount those who use deep learning to scale;
There are hundreds of specialists, maybe 1000 (?) People who can assemble a convolutional network.

At the same time, all knowledge, software and even hardware for deep learning are available to any student. The level of entry into the area in terms of labor costs is also not off scale.

Where I would like to move:

Popularization of ML opportunities among ordinary people;
Popularization of ML and deep learning from students;
Promotion of ML and deep learning in business;

If you have ideas or suggestions on how to do this, write in the comments.

PS Thanks a lot to all the organizers, especially Evgeny Botvinovsky, Sergey Plis, Mikhail Burtsev, Andrei Pakosh, Elizaveta Chernyagina, Vitaly Lvovich Dunin-Barkovsky, Valeria Tsvela and others.

Source: https://habr.com/ru/post/264871/

All Articles