Platforms for learning experiments with reinforcement and not only

The dream of researchers to create a universal artificial intelligence has led to the emergence of a mass of services where you can try a new algorithm on completely different tasks and evaluate how versatile it is. What tasks he copes with, and which tasks are difficult for him.

This article provides a brief overview of twelve such services.

ALE: Arcade Learning Environment

→ Introductory article
→ Repository

Pftorma for the development and evaluation of machine learning algorithms. Provides an interface to hundreds of Atari 2600 games, each of which is unique and designed to be interesting to people. The variety of games presented allows researchers to try to make truly universal algorithms and compare their results with each other.
')
For an algorithm that operates in the ALE environment, the world looks quite simple. Observations - two-dimensional arrays of 7-bit pixels (the size of the array is 160 by 210 pixels). Possible actions - 18 signals, which, in principle, can be generated by the console joystick. The method of obtaining rewards can vary from game to game, but as a rule it is the difference in points on the current and previous frames.

In standard mode, the Atari emulator generates 60 frames per second, but it can be performed much faster on modern hardware. In particular, data are given about 6000 frames per second.

Magent

→ Introductory article
→ Repository

An environment for modeling with a focus on experiments in which hundreds to millions of agents can be involved. Unlike other environments where multi-agent is claimed, but in fact it is limited to dozens of agents. MAgent scales well and can support up to a million agents on a single GPU.

All these efforts are aimed at engaging not only in teaching one agent the optimal behavior, but also in exploring social phenomena that arise in the midst of a large number of intelligent agents. These can be issues related to self-organization, communication between agents, leadership, altruism, and much more.

MAgent provides researchers with the flexibility to customize their environments. The demo version contains three pre-set experimental environments: pursuit (predators must unite in a flock for effective pursuit of herbivores), gathering resources in a competitive environment, the battle of two armies (agents must master environment techniques, "guerrilla war", etc.)

Malmo

→ Introductory article

Platform for basic research in the field of machine learning based on the popular game Minecraft. Minecraft is a 3D game in which a dynamic world of desired complexity can easily be created. Provides API for managing the agent, creating tasks for it, conducting experiments.

Interesting and difficult.

ViZDoom

→ Project site

Based on the popular Doom 3D game, a medium for computer vision experiments and reinforcement training. You can create your own scenarios / maps, use the multiplayer mode, the mode in which the learning agent watches player actions, etc. The environment is fast enough (up to 7000 FPS per stream) and can work both on Linux and under Windows.

Provides an easy-to-use API for C ++, Python, and Java. The API is optimized for use in reinforced learning algorithms. As observations, an image is transferred from the screen buffer to the learning algorithm, and a depth map can also be transmitted.

The project site has a tutorial, video demonstrations, examples, detailed documentation.

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games

→ Introductory article
→ Repository

A platform for basic research of reinforcement learning algorithms.
Allows you to host C / C ++ based games (as well as ALE). In addition, the developers have made a simplified version of real-time strategy (RTS) based on ELF, which can work with a capacity of up to 4000 FPS per core on a laptop. Such performance allows you to train algorithms faster than in environments where ordinary RTS games are used, not optimized to run faster than in real-time mode. There are also options for playing in Tower defense and Capture the Flag.

It may also be interesting for those interested to see the presentation of Yuandong Tian from Facebook Research with ICML2017.

MazeBase

→ Introductory article
→ Repository

Unlike the systems where games are used, which were originally created to entertain people, this work focuses on creating games specially developed for testing learning algorithms with reinforcement. Games created on the platform, you can modify or create new ones.

“Out of the box” system contains a dozen simple 2D games made on the basis of the “world of cells”. When creating the world, the developers were inspired by the classic Puddle World, however, they supplemented it with their ideas and made the map regeneration each time a new training cycle was launched. Thus, the agent is trained every time in a world that he has not yet seen.

OpenAI Gym / Universe

→ GYM introductory article
→ Universe repository

Gym is a toolkit for research reinforcement learning algorithms. It includes a constantly growing collection of test environments for conducting experiments. The project website allows you to share the results achieved and compare them with the results of other participants.

The Universe environment allows virtually any program to be made a test environment without having to refer to its internal variables or source code. The program is placed in the Docker container, and interaction with it is conducted through emulation of keyboard button presses or mouse events. More than 1000 environments are available (mainly various games) in which the AI agent can perform actions and receive observations. Of this thousand, a few hundred also contain information about the "reward" for the act committed. Such environments also include scripts for “clicking on” the start menu of programs and moving directly to the content part of the game or application.

Perhaps the gym is the best choice for beginners.

Tensorflow agents

→ Introductory article
→ Repository

Developers call TensorFlow Agents an infrastructure paradigm. The main focus of this development is on acceleration of learning and testing of algorithms due to the parallel execution of a large number of simulation environments and batch processing of data on the GPU and CPU. In this way, the bottleneck inherent in most other praltforms is expanded and the algorithm debugging cycle is accelerated. At the same time, the applications themselves that support the OpenAI Gym interface are used as the environments themselves, and as has already been written above, there are a lot of them and there are plenty to choose from.

Unity ML Agents

→ Repository

You can now create simulation environments for machine learning using the Unity Editor. They will work using the Unity Engine. According to the proposed paradigm, it is required to define and develop a code for three objects: Academy, Brain, Agent.

Academy - general environment settings, its internal logic. In addition, Academy is the parent object for the remaining entities of the model.

Brain is an object that describes decision logic. There may be several options - an interface to TensorFlow (via an open socket and Python API or via TensorFlowSharp), self-written heuristic-based logic or waiting for keyboard and mouse input for direct control of the agent by a human operator.

Agent - an object containing a unique set of states, observations. Undertaking a unique sequence of actions within the simulation environment. "Body" of the simulated object.

There are also built-in tools for monitoring the internal state of agents, the ability to use several cameras as observations at once (which may be important, for example, if you wish to learn to compare data from several sources, as it happens, for example, in autonomous vehicles) and much more.

DeepMind Pycolab

→ Introductory article
→ Repository

In fact, this is a game engine for developing simple games with ASCII graphics. By virtue of their simplicity and ease, such games allow you to debug learning algorithms with reinforcement even on relatively weak hardware.

Among the ready-made examples there are already "Space Invaders", "Labyrinth", an analogue of "Supaplex" and several other small games.

SC2LE (StarCraft II Learning Environment)

→ Introductory article
→ Repository

Environment for learning to play StarCraft II. StarCraft II is a challenging machine learning task that many of the best minds are fighting right now. Hundreds of units, incomplete map information due to the presence of a "fog of war", a huge variation in development strategies, and a reward delayed by thousands of steps. It looks like StarCraft II will be the next big milestone in machine-training technician wins over a man after go victory.

The environment provides open-source Python tools for interacting with the game engine. In addition to the standard game maps, the developers have made several of their mini-games for debugging various gameplay elements, such as collecting resources, battles, etc.

Also interested in the available recordings of games of professional players and the results of tests of classical machine learning algorithms in relation to this task.

Coach

→ Project site
→ Repository

Modular Python environment for debugging and reinforcement learning algorithms. Allows you to collect simulation agents "from the pieces" and use the full power of multiprocessor systems in the process of evaluating the effectiveness of algorithms and training models.

It contains built-in state-of-the-art implementation of many machine learning algorithms and can be a good starting point for those who want to try how the various algorithms work, but do not delve deeply into the specifics of their implementation.

Coach is able to collect statistics about the learning process and supports advanced visualization techniques to help debug training models.

Instead of conclusion

If you missed something - please write in the comments.

If you take a vacation from February 26 to March 7, you can relax continuously for 17 days. Now you should have more ideas than you can do at this time.

Source: https://habr.com/ru/post/347008/

All Articles