Developing AI for a turn-based game on Node.js (part 1)

Hello!
It has been a full year and a half since writing my first article on Habré . Since then, the FOTM project has undergone several changes. In the beginning we will go through briefly on all upgrades, and then we proceed to a detailed analysis of the main feature - AI.

In the first part of my story, I will tell you about the latest changes in the project and trying to implement AI using a neural network. And in the next article you will learn about the decision tree and plans for the future. So, let's begin!

It would be nice to do ...

After the release of the first article, I received the necessary feedback and began to implement those things that seemed to me the most reasonable and not too complicated.
')

Chat. Since the game is almost completely built on sockets, creating a chat was quick and easy.
Fight in a draw Now, after 50 moves, the game ends in a draw, so as not to waste the time and nerves of the opponent.
Videos. My friend recorded several training videos on character customization and combat mechanics. Movies are now available on the YouTube channel of the game.
Gulp. Armed with new knowledge about this collector, I eased and slightly accelerated the client.
Moving the logic to the server side. This task turned out to be very difficult and painstaking. Most of the mechanic was located exactly on the client side, which is fundamentally wrong. The main part of the feedback on the game concerned precisely this aspect. Nevertheless, the transfer allowed not only to make the game less accessible to evil hackers, but also made it possible to fix some terrible things.

On the above improvements it took me about six months lazy work. As soon as the habra effect had passed, I realized that the people entering the game had no one to play with. They write in an empty chat, stand in line at the arena and ... close the game. After all, if you are alone on the server, then there is nobody to fight :-(

Neuro-fotm

To be honest, I generally wanted to abandon the project, since it has already brought its fruits and experience. But I was always interested in machine learning, so I started searching for information on neural networks. After some time, I came to the conclusion: I need a network learning environment. For this, my game with an untidy field ideally suited to create learning bots.

First, I implemented the function of forming a list of actions available on each turn. There can be 3 of them:

Movement
Use of ability
End of turn

For each action, a resource is spent - energy, the amount of which is limited per turn. Imagine that a player has 1000 energy at his disposal: movement costs 300 energy, using the “fireball” ability - 400. This means that you can perform, for example, the following actions: Move -> Move -> Fireball -> End of turn. After that, the character's energy is replenished to the maximum, and the next player walks.

For the sake of experiment, I made the choice of action random, to see how two crazy bots do all sorts of nonsense :)

After that, the question arose of how exactly AI should choose an action. I considered various options, but in the end I came to the next idea. The neural network accepts normalized information about the situation on the battlefield as input and outputs the following behaviors:

Offensive move Moving to the cell is more advantageous from the point of view of attack (the point at which the maximum number of opponents is at the optimum distance to use attacking abilities)
Defensive move Movement to the cage is more advantageous from the point of view of protection (the point at which the maximum number of allies is at the optimum distance for the use of protective abilities)
Offensive Use of damage to the enemy.
Defensive Use an ability to help a character or ally survive: heal him or remove negative effects.
Control The use of the ability to limit the actions of the enemy: stunning, slowing down, the inability to use abilities, etc.
Gain. Use abilities that increase the characteristics of the character or ally
Weakening The use of the ability to reduce the characteristics of the enemy.

For the neural network model, I chose a rather simple to understand perceptron suitable for my task.

Multilayer perceptron

At the entrance - an array of data about the situation. Consider a small piece of the sequence of values:

let input = [..., 1, 1, 0.8, 1, 0.5, 0.86, ...];

The six numbers in the example show the ratio of the current health to the maximum for 6 characters (active character, 2 ally and 3 opponents). At the output of the neural network, you will see the following:

  let output = [0.2, 0.1, 0.7, 0.45, 0, 0.01, 0.03];

This is an array of the very 7 behaviors that I described above. Those. in this case, the neural network estimated the situation and came to the conclusion that it is best to attack the enemy (0.7), defend oneself or an ally (0.45), or simply go to a cell closer to the enemy (0.2).

Each ability in the game has a separate useClass property that classifies it.

The ability "Prowler" deals damage and stuns the enemy for 7 turns

For the ability "Prowler" this property is as follows:

  useClass : { "offensiveMove" : 0, "defensiveMove" : 0, "offensive" : 1, "defensive" : 0, "control" : 1, "gain" : 0, "weakening" : 0, }

And in the form of an array:

  let abilityUseClassArray = [0, 0, 1, 0, 1, 0, 0];

To determine how “Prowler” is suitable for this model, I use the solution obtained by the neural network ( output ) and compare 2 linear arrays.

  let difference = 0; for( let i = 0; i < output.length; i++ ) { difference += Math.abs(output[i] - abilityUseClassArray[i]); }

The lower the difference value, the more likely this ability will be used. With the complete identity of the arrays (100% coincidence of the behavior and the useClass properties of the ability), the difference will be equal to 0. Then it remains only to choose the action for which the difference will be minimal.

It seems everything looks beautiful and clear, but there are a number of problems.

Problem 1: Normalization

To form an input data array, it was necessary to normalize them in the range from 0 to 1. With the above-mentioned values of the health balance, everything turned out to be quite easy. More difficult with inconstant values, such as temporal effects imposed on a character (buffs and debuffs). Each of them is an object with several important fields, like the time remaining and the effect multiplier (stacks). To make the neural network understand how one effect differs from another, I had to enter the same useClass field as for abilities. Thus, I was able to describe the effect, but the problem remained of their number. To do this, I took the number of buffs and debuffs imposed on the character and normalized it as:

  buffs.length / 42

This solution practically does not tell the neural network about the properties of objects inside the array of buffs. On average, the characters can hang 2-3 effects. It is impossible to pass the bar at 42, because in battle only 6 characters and 7 abilities are in each. As a result, the normalized description of the game situation is an array of about 500 values.
One could make 42 sequences of values for describing effects (when there is none, fill them with zeros). But even if, for example, 10 properties fall on each, then 420 values will be released (and this is only for buffs). Therefore, I postponed this question for a while :)

Problem 2: Teaching Sampling

To form a training sample, I had to manually fill in the output values for a number of situations. I implemented a UI that showed all the actions available this turn. The selected action was recorded in a separate JSON file as a solution (output) for a given set of input values (input). For one batch, I managed to form about 500 input-output matches, which was the training sample. But the main question continued to hang in the air: how large should the sample be ?

Moreover, if for some reason I decided to change the description of the situation (as it happened), then everything would have to start all over again. For example, if the input data array consists not of 520, but of 800 values, then the entire old sample can be thrown into the dustbin along with the network configuration.

Problem 3: Network Architecture and Configuration

So, we have about 520 values in the array of input parameters and 7 values at the output. To implement the neural network, I chose the Synaptic.js library, and implemented the network as follows:

  var network = new Architect.Perceptron(520, 300, 7); // input: 520, hidden: 300, output: 7 var trainer = new Trainer(myNetwork); var trainingSet = [ { input: [0, ... , 1], // input.length: 520 output: [0, 0.2, 0.4, 0.5, 0.1, 0, 0] //output.length: 7 }, ... ]; trainer.train(trainingSet, { rate: .1, iterations: 10000, error: .005, shuffle: true, log: 1000, cost: Trainer.cost.CROSS_ENTROPY });

It looked like the first network configuration. I launched it and ... after 10,000 iterations, the neuron could not even come close to the given error value of 0.005, while spending 2 hours. I thought about what can be changed to achieve a given value. And I realized that everything is bad: (

Consider the available configuration options:

Sample size
The number of hidden layers and the size of each of them
Learning rate
Number of iterations
Error value
Evaluation function (3 options or you can write your own)

It’s quite difficult to understand how each of them affects the result of work, especially if you have been doing neural networks for 2 weeks. If you make 1 hidden layer of only 10 neurons, then an error in 0.01 is reached fairly quickly (approximately 100 iterations), but suspicions about the flexibility of such a network creep in. Most likely, if you “feed” her an unusual game situation, she will take a completely unacceptable decision.

Problem 4: Speed Training

With the above configuration, network training lasted about two hours (approximately 1.38 iterations per second). This is quite a long time, considering that you need to experiment with numbers to get a result. The problem was that the calculations were made on the CPU (Intel Core i5-4570), and not on the video card. At this point, I wondered about transferring GPU computing using CUDA. I shoveled a lot of stuff and came to the conclusion that the chances of setting up CUDA for Node.js on Windows are almost equal to 0. Yes, you can deploy a separate server on Linux that would only do network calculations. Try to write this server not on Node.js, but in Python and many other options. But what if the AI version built on a neural network is simply unacceptable for solving my problem?

Problem 5: Features of game mechanics

At the network development stage, I encountered two more problems with the chosen approach to implementing AI.

Lets me take it ability description

Not all abilities can be attributed to a single behavior. The most striking example is the oracle's “Lets me take it” ability. She "steals" an accidental positive effect from the opponent and applies to the one who used it. The problem is obvious - there are quite a few varieties of positive effects: some are treated, others protect allies, others enhance the combat characteristics, and some restrict the character's movement. If we steal the reinforcing effect, what will it be? Strengthening yourself ( gain ) or weakening the enemy ( weakening )? Essentially both. But the effect can also heal, therefore - this is already defensive behavior; and if treatment was taken away from the enemy, then also an offensive . Thus, the ability “Lets me take it” falls under all behaviors. Which, of course, is very strange. This ability is far from the only one with a random factor.
Behavior is defined only for a specific situation. The decision on what is best to do at the moment does not take into account the following actions of both the active player and the opponent's players. There is no simulation of situations and miscalculations of outcomes of events.

All the above problems made me doubt the correctness of the chosen approach to the development of AI. One colleague who was knowledgeable about machine learning advised using a decision tree instead of a neural network. We will talk about this in the next part of the article ...

In the meantime, thank you for your attention!

Source: https://habr.com/ru/post/345576/

All Articles