Artificial intelligence on the example of a simple game. Part 2

This time the game "Snake" is chosen.
A library for the neural network in the Go language has been created.
Found the principle of learning, dependent on the "depth" of memory.
Written server for games between developers.

The essence of the game

Perhaps many people remember the game "Snake", which was a standard application on Nokia phones. Its essence is as follows: a snake moves in the field, which decreases if it does not find food, or increases if it finds it. If a snake bumps into an obstacle, then it dies.

I slightly changed the rules: the snake does not die if it cuts, but simply stops, continuing to decrease. In addition, the snake can be divided in half. If a snake has only one cell of the body left and it did not find food in 10 moves, then it dies, turning into food.
')
We will train the bot that controls the snake. If the snake splits, the bot will get another snake in control, which in turn can split.

The experiment with the snakes of cyber-biologist Mikhail Tsarkov was taken as a basis.

Neural network

As part of the task, a library for the neural network was written in the Go language. Studying the work of the neural network, I use the video diary foo52ru and the book of Tariq Rashid - Create a neural network.

The CreateLayer(L []int) function creates a neural network with the required number of layers and their size. On each layer, except the last one, a displacement neuron is added. On the first layer we submit data, and from the last layer we get the result.

Example:

 CreateLayer([]int{9, 57, 3, 1})

Here we created a neural network with nine inputs. Two hidden layers of 57 and 3 neurons and one neuron to get the result. Offset neurons are automatically added to the ones we set.

The library allows you to:

Submit data to the network input.
Get the result, referring to the last layer.
Ask the right answers and conduct training on adjusting the link weights.

The initial link weights are given by random values close to zero. To activate, we used the logistic function.

Bot training

The bot receives at the entrance a field in the size of 9x9 cells, in the middle of which there is a snake's head. Accordingly, our neural network will have 81 inputs. The order of the cells supplied to the input does not matter. The network, when learning, will “figure it out for itself” where what is located.

To indicate obstacles and other snakes, I used the values from -1 to 0 (not inclusive). Empty cells were designated as 0.01, and food was 0.99.

At the output of the neural network, 5 neurons were used for actions:

move left on the x-axis;
to the right;
up the y axis;
way down;
share in half.

The movement of the bot determined the neuron that is most important at the output.

Step 0. Randomizer

The bot randomizer was first created. So I call a bot that walks randomly. It is necessary to verify the effectiveness of the neural network. With proper training, the neural network should easily beat it.

Step 1. Learning without using memory

After each move, we adjust the weights of connections for the output neuron that has indicated the highest value. Other output neurons do not touch.

For learning the following values were given:

Found food: 0.99
made movement in any direction: 0.5
lost a cage of a body without finding food (10 moves are given for this): 0.2
standing still (hit an obstacle or stuck): 0.1
standing still with one cell of the body: 0.01

Having passed such training, bots began to quickly beat the randomizer, and I set the task: to create bots that would beat these.

A / b testing

To perform this task, a program was created that divides the snakes into two parts, depending on the configuration of the neural network. On the field produced 20 snakes of each configuration.

All snakes that are under the control of one bot, had the same neural network. The more snakes there were in his management and the more often they encountered different tasks, the faster they were trained. If, for example, one snake has learned to avoid dead ends or to share in half, hitting a dead end, then automatically all the snakes of this bot acquired these skills.

By changing the configuration of the neural network, you can get good results, but this is not enough. To further improve the algorithm, I decided to use memory for several turns.

Step 2. Learning with memory

For each bot, I created a memory for 8 turns. The state of the field and the move suggested by the bot were recorded in memory. After that, I adjusted the weights for all eight states that preceded the move. For this, I used a single correction factor, independent of the depth of the stroke. Thus, each move led to the adjustment of weights not once, but eight.

As expected, the memory bots quickly began to beat the bots that were trained without using memory.

Step 3. Reducing the correction factor depending on the depth of memory

Then I tried to reduce the correction factor, depending on the depth of the memory. For the last move made, the largest scale adjustment factor was established. In the course that preceded it, the correction factor decreased and so on throughout the memory.

The linear decrease in the coefficient of adjustment depending on the depth of the memory led to the fact that new bots began to beat those who used a single coefficient.

Next, I tried to use a logarithmic reduction of the correction factor. The coefficient decreased twice, depending on the depth of memory for each turn. Thus, the moves that were made “long ago” had a much smaller impact on learning than the “fresh” moves.

Bots with a logarithmic decrease in the correction factor began to beat linearly dependent bots.

Server for bots

As it turned out, improving the level of "boostering" bots can be infinite. And I decided to create a server where developers could compete with each other (regardless of programming language) in writing an effective algorithm for Snakes.

Protocol

For authorization, you need to send a GET request to the “game” directory and specify the user name, for example:

 .../game/?user=masterdak

Instead of "..." you need to specify the address of the site and the port where the server is deployed.

Next, the server will issue a response in JSON format with the session:

 {"answer":"Hellow, masterdak!","session":"f4f559d1d2ed97e0616023fb4a84f984"}

After that, you can request the map and the coordinates of the snake on the field, adding to the request session:

 .../game/?user=masterdak&session=f4f559d1d2ed97e0616023fb4a84f984

The server will produce something like this:

 { "answer": "Sent game data.", "data": { "area": [ ["...  ..."] ], "snakes": [ { "num": 0, "body": [ { "x": 19, "y": 24 }, { "x": 19, "y": 24 }, { "x": 19, "y": 24 } ], "energe": 4, "dead": false } ] } }

The area field will indicate the state of the playing field with the following values:

 0 //  -1 // -2 // 2 //  1 //

This is followed by an array of snakes that are in your control.

The body of the snake is in the body array. As you can see, the whole body of a snake (including the head - the first cell) at the beginning is in the same position “x”: 19, “y”: 24. This is due to the fact that at the beginning of the game, the snake climbs out of the hole, which is determined by a single cell on the field . Further, the coordinates of the body and head will be different.

The following structures (Go example) define all server response options:

 type respData struct { Answer string Session string Data struct { Area [][]int Snakes []struct { Num int Body []Cell Energe int Dead bool } } } type Cell struct { X int Y int }

Next, you need to send the move that the snake makes by adding move to the GET request, for example:

 ...&move=u

u - means command up;
d - down;
l - to the left;
r - right;
/ - division in half.

The command for several snakes (for example, for seven) will look like this:

 ...&move=ud/urld

One character - one team. The answer should contain a command for all snakes under your control. Otherwise, some snakes may not receive the command and will continue the old action.

The field is updated with an interval of 150 ms. If no command is received within 60 seconds, the server closes the connection.

Links

To avoid habraeffekt, those who will be interested to see, send me a message. In response, I will send the ip address of my server. Or you can deploy your server using the source code of the program.

I am not a specialist in programming or in neural networks. Therefore, I can make gross mistakes. I spread the code "as is". I would be glad if more experienced developers show up on the mistakes made.

Source: https://habr.com/ru/post/451070/

All Articles