AI development on the example of the game Dicey Dungeons

For about a month, I solved one of the most difficult technical problems of my new game Dicey Dungeons - perfecting the AI for the final release of the game. It was quite an interesting work, and much of it became new for me, so I decided to write a little about it.

To begin with, I will explain: I am not an expert in computer theory, but simply one of those who have studied programming sufficiently to create video games, after which I finished my studies, grabbing only what I needed. I can usually solve my problems on my own, but a real programmer would most likely not approve of my decisions.

I tried to write an article at a high enough level of abstraction so that the main ideas could be understood even by non-programmers. But I am not an expert in such things, so my explanation of the theory may be erroneous. Write me about it in the comments to the original, I will gladly make changes!
')
Well, let's begin by explaining the problem!

Task

In case you didn’t play Dicey Dungeons, I’ll briefly tell you about the game: this is an RPG with deckbuilding, in which each enemy has a set of weapon cards that perform different actions. In addition, they roll the dice! Then they put these cubes in service to cause damage, or create different status effects, or heal, or defend against damage, and the like. Here is a simple example of how a small frog uses a big sword and a small shield:

A more complex example: this Wizard of all purposes has a spanner (spanner) that allows you to put two dice together (that is, 3 + 2 will give 5, and 4 + 5 will give 6 and 3). He also has a hammer (Hammer), which imposes on the player the effect of “shock”, if we apply a six to it, and a tube for shooting peas (Pea Shooter), which does little damage, but it has a “countdown”, there is she acts a few moves.

Another important complication: the game has status effects that change the capabilities of opponents. The most important of them is Shock, which randomly turns off the weapon; the shock can be removed by using an additional die on it, and a “burning” (Burn) that ignites the cubes. While the cubes are burning, they can be used, but each use will cost 2 health points. This is what an intelligent Master of all trades does when I impose shock and burning on all his weapons and dice:

Of course, there are many other things in the game, but this is enough to get a general idea.

So, our task: how to make the AI choose the best action for his turn? How can he know which of the burning cubes to extinguish, which cube to use to relieve the shock, and which one to save for important weapons?

As he did before

For a long time, the AI in Dicey Dungeons had only one rule: he looked at all the weapons from left to right, determined the best cube that could be used for it, and then used it. It worked great, but there were exceptions. So I added new rules.

For example, I coped with shock, looking at all non-shock weapons, and choosing which cube I would use on it when the shock would have been removed, and then marked this cube as “reserved” for the future. I worked with burning dice like this: I checked if I had enough health to put out, and I chose randomly if I needed to do it.

I added rule after rule for everything I could imagine, and as a result I got an AI that seemed to work! In fact, it is surprising how well this interweaving of different rules showed itself - the AI in Dicey Dungeons may not always have made the right decision, but it was always at least acceptable. At least for a game still under development.

But over time, the system of constantly adding new rules began to crack at the seams. People discovered exploits that made AI behave foolishly. For example, with the right approach, it was possible to outwit one of the bosses so that he would never attack a player. The more rules I added to correct the situation, the more strange things started to happen - some rules contradicted others, and boundary cases began to appear.

Of course, one of the solutions was to add new rules, review each task one by one and create new if structures for processing them. But I think that this way I just pushed aside the true solution of the problem. The limitation of the system was that it was only concerned with one question: “What will be my next move?”. She never looked ahead and tried to guess what could come out of a particular smart combination.

So I decided to start over.

Classic solution

Try searching for information about AI for games, and most likely the first thing you will encounter is the classic solution - creating a minimax algorithm. Here is a video about how it is used in the development of AI for chess:

The implementation of the minimax is as follows:

First, we create the simplest, abstract version of our game, which has all the necessary information for a specific point in time in the game. We call it the board . In the case of chess, these are the current positions of all the pieces. In the case of Dicey Dungeons, this is a list of dice, weapons, and status effects.

Then we create a value function that measures how well the game is going for a particular configuration of the game, that is, for a particular board . For example, in chess, the board on which the figures are located in their initial positions is estimated at 0 points. The board on which you ate the enemy pawn has a value of 1 points, and the board on which you lost your own pawn is worth -1 points. And the board on which we mate the enemy will be evaluated in an infinite number of points, or something like that!

Then, from this abstract board we simulate all possible moves that we can make, which gives us new abstract boards. Then we simulate making all possible moves on these boards, and so on, as many steps as you like. Here is an excellent illustration of this solution from the freecodecamp.org site:

We create a graph of all possible moves that both players can make, and apply a value function to it to evaluate how the game is going.

And in this, Dicey Dungeons is different from the minimax: the minimax came from the mathematical theory of games, it is designed to find the best series of moves in the world, where the enemy seeks to maximize his score. The algorithm is called this because it minimizes the player’s losses when the opponent plays to maximize his winnings.

But what happens in the Dicey Dungeons? Actually, I don't care what my opponent does. In order for the game to be exciting, it is enough for artificial intelligence to make logical moves - to determine the best way to use the cubes to the armament so that the fight is fair. In other words, only “max” is important to me, without “mini”.

That is, in order for the AI Dicey Dungeons to make a good move, it is enough for me to create this graph of possible moves and find the board with the highest rating, and then make moves leading to this point.

The simple move of the enemy

Well, move on to the examples! Let's look at the frog again. How can she decide what to do next? How does she know that the chosen action is the best?

In fact, she has only two options. Put 1 on the wide sword, and 3 on the shield, or do the opposite. She obviously decides that it is better to put on the sword 3, not 1. But why? Because she studied all possible results:

If you put on the sword 1, then we get 438 points. If you put 3 on it, you get 558 points. Wonderful! So, I get more points by putting 3 on the sword, the problem is solved.

Where do these glasses come from? The evaluation system in Dicey Dungeons currently takes into account the following aspects:

Damage: The most important factor is 100 points for every point of damage dealt.
Poison: an important status effect that the AI considers almost as important as damage - 90 for each poison.
Creating other status effects: for example shock, burning, weakening, etc. Each of them is worth 50 points.
Bonus status effects: adding positive status effects to the player, such as defense and the like, costs 40 points each.
Use of weapons: the use of any type of weapon is worth 10 points, because if nothing else succeeds, the AI simply has to try to use everything.
Reducing the countdown: for the activation of certain types of weapons (for example, for Pea Shooter) just enough total amount on the cubes. Therefore, the AI gets 10 points for each point of countdown reduced by it.
Points on dice: the AI gets 5 points for each unused point on the dice, that is, 1 is worth 5 points, and 6 - 30 points. This is done to ensure that the AI prefers not to use cubes, which are not necessary to use, due to which its moves become very similar to human ones.
Duration: the AI loses 1 point per turn, so long turns have a little less value than short ones. This is done so that in the presence of two moves, otherwise having the same value, the AI chooses the shortest one.
Treatment: it costs only 1 point for one restored health point, because even though I want the AI to consider it important, it doesn’t really follow its health. There are always better things to do!
Bonus points: you can add them to any turn to force the AI to do something that he otherwise would never have done. Used very sparingly.

And finally, there are two special cases - if the attacked target ends in health, then it is worth a million points. If the health ends at the AI, then it costs minus a million points. This means that the AI will never accidentally kill itself (say, by extinguishing a die at very low health), or it will never miss a move in which it can kill a player.

These numbers are not ideal - take, for example, the current open issues: 640 , 642 , 649 , but this is not very important. Even approximately exact numbers are enough to encourage AI to do more or less correctly.

More difficult moves of the enemy

The case of the frog is so simple that even my horrible code can calculate all the options in just 0.017 seconds. But then the situation becomes more complicated. Let's look again at the example of the Jack of all trades.

Its decision tree is "slightly" more complicated:

Unfortunately, even in relatively simple cases, an explosion of complexity occurs rather quickly. In this case, in our graph, there are 2 670 nodes that need to be investigated, and it takes much longer than in the case of a frog — perhaps one or two seconds.

This is largely due to the combinatorial complexity - for example, it does not matter which of the twos we use to take a shock initially, the algorithm considers this as two separate solutions, and creates for each a complete tree of branching solutions. As a result, we get a branch, the duplication of which is absolutely not necessary. There are also similar combinatorial problems when choosing cubes to pay off, to remove the shock from the armament and the order of their use.

But even if we find and optimize such unnecessary branches (which I do to a certain extent), there will always be a point at which the complexity of all possible rearrangements of solutions leads to huge, slow decision trees, the evaluation of which will take an infinite amount of time. So, this is the first serious problem of this approach. Here is another one:

Master key Divides the cube into two.

This important type of weapon (and similar ones) causes AI problems, because the result of its use is uncertain . If I put a six on it, I can get five and one, or four and two, and maybe two triples. I will not know this until I do it, so it is very difficult to create a plan that will take this into account.

Fortunately, Dicey Dungeons uses a great solution to both of these problems!

Modern solution

The Monte Carlo method for tree searching (Monte Carlo Tree Search, MCTS) is a probabilistic decision-making algorithm. Below is a slightly weird video that, nevertheless, explains very well the principle of decision making based on the Monte Carlo method:

In fact, instead of adding every possible move to the graph, the MCTS checks the sequences of random moves and then keeps track of those that have performed better. Thanks to a formula called Upper Confidence Bound, he can magically determine which branches of the decision tree are "most promising":

By the way, I took this formula from a very useful article about searching trees by the Monte Carlo method . Do not ask me how it works!

Surprisingly in MCTS, to find the best solution, we usually do not need to do a blunt search of everything, and we can use the same abstract board / move simulation system as in the minimax. That is, we both use both algorithms. This is the scheme I used in Dicey Dungeons. First, it tries to complete the deployment of a decision tree, which usually does not take much time and leads to the best result. But if the tree seems too big, then we fall back to using MCTS.

MCTS has two very cool properties that are perfect for Dicey Dungeons:

First, the method works ideally with uncertainty. Since it is executed again and again, collecting data from each run, I simply allow it to simulate indefinite moves, for example, the use of a master key, naturally, and after a lot of runs, the method creates a fairly correct range of points resulting from this move.

Secondly, he can give me a partial solution. In fact, when working with MCTS, you can perform as many simulations as you like. Theoretically, if it is performed infinitely, it will converge to exactly the same results as the minimax. However, it's more important for me that I can use MCTS to get a good solution for a limited amount of thinking time. The more searches we do, the better the “solution” found will be, but in the case of Dicey Dungeons, only a few hundred searches that take a fraction of a second are often enough.

Interesting related topics

So, this is how the enemies in Dicey Dungeons decide how to kill you! I want to add this system to the next version of v0.15 game!

Where did the graphs that I showed come from, including on twitter:

I created them by writing an exporter for GraphML , an open source graph file format that can be read by many different tools. (I enjoyed the excellent yEd , which I highly recommend.)

Part of the solution to this problem was to allow the AI to simulate moves, which in itself is an interesting puzzle. As a result, I implemented an action scripting system. Now when opponents use different types of weapons. they execute these small scripts:

These small scripts are executed by the hscript parser and haxe-based expression interpreter. This part was difficult to implement, but the efforts were justified: it made the game super-convenient for creating mods. I hope that after the release of the game, people can use this system to develop their own weapons, that is, they can add to the game almost everything they can imagine. In addition, since the AI is smart enough to evaluate any action transferred to it, enemies will be able to figure out how to use any modified weapons that players will create!

Source: https://habr.com/ru/post/432062/

All Articles