We generate levels for the game using neural networks

Foreword

Over the past few years, progress in the field of artificial intelligence has led to the creation of machine learning methods based on representation-representation with several layers of abstraction - the so-called “deep learning”. Social and media attention was attracted to this area of research thanks to the ancient Chinese board game of go. Despite the fact that go complexity is often compared to life itself, the AlphaGo program, which uses deep learning with deep reinforcement learning, managed to outperform the world champion by tho Lee Sedol. Surprisingly, AI research was used in games and received such wide public attention. It is also worth noting that one of the AlphaGo developers, Demis Hassabis, was the lead programmer of Theme Park (1994) and the lead programmer of AI Black & White (2001). Games and modern AI progress may have some correlation.

This article is a post-mortem, a report on our team’s attempt to implement level generation for Fantasy Raiders using various methods of artificial neural networks. Previously, level generation was the process of coding the knowledge of a game developer using certain probabilistic techniques. However, for Fantasy Raiders, we wrote a program that could learn and generate levels based on our data. As it seems to us, as a result, we received only the key to solving the problem of generating levels, and not the general solution. To share our discoveries with other game developers, we want to elaborate on our research process, from beginning to end.

[Fig. 1. Generating Levels with Neural Network]

Complexity assessment

At the first stage of our research, we had to firmly make sure that an artificial neural network could learn at the levels of Fantasy Raiders . Therefore, we had to start with the simplest: assessing the complexity of the already completed levels.
')
Fantasy Raiders is an RPG that aims to create for each player their own levels corresponding to their skills and tastes in order to provide them with an exciting gameplay. The unit of level is a self-contained room. Something similar can be seen in The Binding of Isaac (2011). For convenience, we will call each room the Fantasy Raiders "level."

The game recommends the player a new level (room) with a suitable difficulty, depending on the state of the player’s character and the complexity of the current level (room). The complexity of each room is calculated by an algorithm that evaluates the NPC and items at the current level.

[Fig. 2. The complexity of the levels recommended in the sequence of levels]

The simplest step in the assessment process was the assignment of NPC numerical values or their HP values (Health Points). However, it was much more difficult to evaluate the interactive objects in the room. That is why we decided to check whether the artificial neural network can replace such a heuristic algorithm.

Data collection

For artificial intelligence to learn, it needs data. To automatically classify levels according to their complexity, the data unit must look like a pair of “level - complexity”. However, the assessment of each level by the algorithm cannot create meaningful results, because its results are limited by the algorithm itself.

At first, we considered the possibility of creating a gameplay bot that will play at each level and evaluate its complexity by the results obtained. However, it was almost impossible to increase the speed of the game process to values suitable for machine learning. Therefore, we abandoned this plan and asked three developers from our team to evaluate all the levels we had on a five-point scale. During the day, they had time to evaluate at 20 levels, and in general, an assessment of 1000 levels took two months.

[Fig. 3. An example of assessing the complexity of levels.]

Difficulty prediction

After the layout of the data was completed, we took screenshots of the levels in the editor, which became an abstract version of each of the levels. We gave the program to learn on a set of data in the form of pairs of “level screenshot in the editor - complexity”. We used pictures from the level editor, because in low resolution they are more distinguishable than in-game screenshots, that is, more effective in terms of learning speed and quality of incoming data. For this process, we decided to use CNN ( Convolutional Neural Network (convolutional neural networks)), because compared to other neural networks, they are better suited for image classification. To assess the quality of her work, we chose a reference point derived by one of our game designers as a reference point.

Prediction based on the formula developed by the game designer: accuracy 42
Prediction based on screen capture of Level Editor (CNN): 62% accuracy

Even with the standard CNN model, accuracy increased by 20%. We tried several times to use other complex CNN models, but without any significant results. The results were adversely affected by a limited amount of data (approximately 1000 pairs).

[Fig. 4. Predicting Difficulty Using CNN]

Driving along the path inspired by the works of David Silver et al. Do you want to know how to do it? (2016) from the Nexon Developers Conference 2016, we understood that in order to get better results, we need to diversify the input data. According to the work of Silver, in 2016, in addition to the location of white and black stones, the number of moves after the start of the game, the number of “dead” stones after the start of the game, as well as contextual and processed information about the screen (“ladders”) were included in the data sets for AlphaGo . Similarly, in Kim’s article, to assess the difficulty of the level of the game, information about the NPC and the relief was processed.

The level editor shots were made up of graphical elements, because they had to be human readable. But not all the information about each NPC, object or object is represented in the graphic elements. Therefore, again with the help of fellow designers, we have re-classified pieces of information that may affect the level of complexity. All information values with the same quality are classified into a group that has a unique value in one channel: R, G, B or A. The more information values we had, the higher accuracy was to be expected, but the complexity of generating also increased. levels. Therefore, through trial and error, as a result, we came to four basic information values that were used in the complexity assessment process.

[Fig. 5. A snapshot of the level editor (above, RGBA) and the encoded image (below, R, G, B, A). Each image is calibrated in color in 128 grayscale to increase the visibility of the image on the screen.]

Prediction using coded images (reference values - logistic regression): 61% accuracy
Forecasting with coded messages (CNN): 71% accuracy

Through the use of other input data, the accuracy in the same model structure increased by 10%. Moreover, the size of each image required for the learning process was reduced by 64 times, which made it possible to speed up learning.

[Fig. 6. Encoded image of Fantasy Raiders in-game levels]

Automated level generation

Thanks to the assessment of complexity, we could be sure that the neural network could learn the features of any level. Based on what the AI learned in the previous stages, we proceeded to the next step: level generation.

In the field of generating images, voice and text, active research is being conducted. Since snapshots were used as input to our model, we started with GAN (Generative Adversarial Networks), which are widely used in many cases and show good results.

[Fig. 7. Classification of generative models - Ian Goodfellow (2016), Figure 9. "NIPS 2016 Tutorial: Generative Adversarial Networks"]

GAN (Generative-Competitive Networks)

Since the first implementation of GAN in 2014 and GAN connections with CNN in the DCGAN model at the end of 2015, various versions of the GAN were created. More than ten of them can be used to generate images. (If you're interested in seeing how complex GAN results can generate, study the-gan-zoo .)

[Fig. 8. Anime characters generated by GAN - Yanghua JIN, "Various GANs with Chainer" ]

In the process of learning to generate levels, we, as in the process of assessing the complexity, used coded images. After completion of the training, the generator created such an image that the decoder could convert to a level.

[Fig. 9. GAN learning process to generate levels. Generator and Discriminator are neural networks. Generator tries to trick Discriminator by presenting the levels generated by the program for those created by the game designer, and Discriminator to separate the levels created by the game designer from the ones generated by the program. By repeating this process, the Generator is able to generate levels more and more similar to those created by the game designer.]

[Fig. 10. Generation of levels after completion of training.]

Most of all, we were worried about the limited amount of data collected - only about a thousand, because in the simplest MNIST database there are more than 60,000 records in total. The first attempt using DCGAN ended in failure. Most of the other attempts with the help of other newly invented GAN models, which showed tremendous results in image generation, also failed to generate levels. Even if they succeeded in something, they generated very limited types of levels.

[Fig. 11. An example of an image obtained after unsuccessful learning. The sample for training is 8 x 8 levels. After 5000 repetitions with a series size of 16 (left). After 50,000 repetitions with a series size of 16 (right).]

At that moment we almost gave up and thought that the reason for the failure was a small amount of data. However, the training ended in success when using DRAGAN , the stable version for GAN training.

[Fig. 12. Level images generated using DRAGAN trained on data in 1000 records. Only the simplest levels were generated at this stage.]

However, due to the small amount of data, DRAGAN still could not generate more complex levels.

Expansion of data

Even despite the fact that for machine learning 1000 levels is a rather small amount, it took several years for two game designers to create them. We could not immediately increase the amount of incoming data. Therefore, we tried to increase the amount of data using a method commonly used in other areas of machine learning: we increased the amount of data from 1000 to 6000, turning each level by 90, 180, 270 degrees from the original and replacing NPC types, objects and objects.

[Fig. 13. Example of expanding the amount of data - the original version (left), rotated clockwise 90 degrees (in the middle), with the replacement of the NPC (right)]

After thousands of iterations with 6,000 datasets, the model finally began to generate more complex layers.

[Fig. 14. The more times the training was repeated, the more difficult the levels were generated by the program. They became similar to those created by game designers.]

[Fig. 15. Level created by game designer (left). Level generated by GAN (right).]

(Semi-controlled) CGAN

From the very beginning of the development of Fantasy Raiders , we studied generative grammar, hoping that over time, automated generation will replace the creation of levels manually. Therefore, when we became convinced that GAN was generating more complex levels, we applied CGAN , which generates data according to the conditions, to generate levels by complexity.

As mentioned above, we had only a thousand levels, the complexity of which was manually appreciated by our fellow designers. Due to the small amount of data we could not create more complex levels. To solve this problem, we increased the amount of data by expanding the data volume (Data Augmentation). However, 5000 levels is too much for designers to evaluate all of them manually.

Therefore, we decided to use the method of semi-controlled learning: in this case, when only part of the data is marked out, the Discriminator determines which of the levels are created by the game designers, and which are created by the program for sampling from all levels. However, it does not determine the complexity of the levels obtained by expanding the volume of data. Read more about this method in the Improved Techniques for Training GANs .

[Fig. 16. Levels generated by the CGAN network with the same initial seed value, but with various difficulties.]

RNN (recurrent neural networks)

Level generation with GAN worked quite well. The model worked quite well with learning any characteristics of the level form, but did not learn the contextual information of the levels.

[Fig. 17. The generative model using the GAN can learn any characteristics of the shape of the levels, but it cannot learn the contextual information of the levels. A fence built by a game designer (left) and a fence generated by a generative model using a GAN (right).]

However, there may be other reasons, the most important of which are the characteristics of the incoming data themselves. In the general case, the images are made up of a series of values, a small difference in which does not make major changes. However, in the level editor snapshots, the data is rather discrete, and any small difference in values can make big changes. Therefore, our input data (level shots) was more like an offer than an image.

As a result, we have come to the use of RNN , which are more often than other generative models used to generate sentences. We chose LSTM , one of several versions of RNN (we recently learned that there is another study that uses GAN to generate discrete values in generating sentences, but we have not experimented with it yet.)

[Fig. 18. Generating Levels with RNN]

LSTM (Long Short-Term Memory Units)

To use LSTM, all levels must be translated into strings. Since we have already coded all levels, the translation process did not cause problems. We had to separate the encoded image and glue it into a one-dimensional string.

[Fig. 19. Translation level to string for RNN-learning]

In order to generate a level in accordance with its complexity, we added information about the level complexity to the string. After all the repetitions, LSTM began to generate levels, learning not only the characteristics of the form, but also the contextual information.

[Fig. 20. Levels generated by RNN - Generated lines (top). Levels decoded from strings (bottom)]

The RNN model generated levels more similar to those created by game designers, but could not generate levels with a fence that qualitatively covered the center or the angle of the level. It seemed that this required a greater understanding of space.

At first we suspected that the hyperparameter was the cause of the problem and tried to change it, but nothing helped. RNN cannot generate levels with the understanding of two-dimensional space.

PixelRNN (PixelCNN)

In the process of searching for a method capable of combining the benefits of RNN with an improved understanding of the contextual information about the space, we found PixelRNN , a solution for generating images based on RNN. (Later we moved to PixelCNN, which has a higher learning rate).

PixelRNN requires images, not sentences, as input data, but we have already completed all the processes that generate coded images for GAN training.

And then, finally, PixelRNN began to generate new levels with a two-dimensional understanding of space and other contextual information. In these generated levels there were square and intersecting fences, which became a proprietary feature of Fantasy Raiders. At the same time, the levels could hardly be distinguished from those created by game designers.

[Fig. 21. Generated PixelRNN levels with different location and form of fences.]

We work together: man plus car

After the start of the automated level generation process, we received a weekly feedback from the game designers on the current trained neural network model. We found it interesting that some designers were inspired by the levels generated by the machine and tried to create new levels based on them.

Having made such an observation, we thought it would be nice to create levels together: man plus car. Inspired by the Sketch-RNN study, we added a level editor mode that allowed the designer to generate a new level with a machine learning model.

[Fig. 22. If the game designer makes a choice, the machine recommends an NPC, object or item to it.]

[Fig. 23. If a game designer chooses several options, the car makes the remaining choices instead of a game designer.]

Fantasy Raiders is developed on the basis of Unity, and the level editor is also based on Unity. Therefore, a level instance is created in JSON and transferred to the server, and the server makes changes to it and sends it back to Unity in JSON format.

(Unity provides the Unity Machine Learning Agents SDK developers, allowing you to use reinforcement learning in Unity. However, the Python API has limited capabilities. We hope the Unity team will extend its Python API in the SDK so that we can use it not only in reinforcement learning, but and other machine learning methods.)

Summarize

At the current stage of the development of Fantasy Raiders, the generation of levels has become not just a technical problem for engineers, but also a problem for game designers who are already participating to improve learning outcomes. With all the obstacles, including those that are not mentioned here, not only technical knowledge, but also information about our own game, Fantasy Raiders , helped us to cope.

As stated above, we believe that our experiments cannot be a common solution to the problem of generating levels using neural networks. We are still not quite sure that at the present stage the neural network generation technique can replace procedural content generation. However, it is difficult to deny that it allows you to take a fresh look at content generation techniques.

Wish us good luck so that we can tell you about new finds!

[Fig. 24. The first level of Fantasy Raiders generated by machine learning.]

Additional reading

Generative Models : a brief introduction to generative models.

Generative Adversarial Nets in TensorFlow : A concise and simple introduction to the work of GAN.

The Unreasonable Effectiveness of Recurrent Neural Networks : a concise introduction to RNN.

RNN models for image generation : an introduction to generating images with RNN.

Teaching Machines to Draw : a classic example of the use of a neural network in the field of creativity. This example shows that we should consider the neural network not only as an automation tool, but also as a source of inspiration.

AlphaGo : a source of inspiration for using neural networks in games.

Artificial Intelligence and Games : a look at the game AI from a different angle. In particular, Procedural Content Generation via Machine Learning (PCGML) deals with a lot of research on generating content for games using machine learning.

The article is written by programmers Maverick Games Seungback Shin and Sungkuk Park.

Source: https://habr.com/ru/post/350718/

All Articles