📜 ⬆️ ⬇️

Machine learning of a deep neural network with reinforcement at tensorflow.js: stunts

Teaching deep neural networks from scratch is not an easy task.

It takes a lot of data and time to learn, but some tricks can help speed up the process, which I will discuss under the cut.

Demonstration of the passage of a simple maze using tricks. Duration of training network: 1 hour 06 minutes. Record accelerated by 8 times.
')

For each task you need to develop your own set of tricks to speed up network learning. I will share a few tricks that helped me train the network much faster.

For theoretical knowledge, I recommend switching to the channel sim0nsays .
And I will tell you about my modest success in learning neural networks.

Formulation of the problem


Approximate the convergence function minimizing the quadratic loss function by back propagation of error by deep neural networks.

I had a choice of strategy how to train a neural network.
Encourage the successful completion of the task or encourage as you approach the completion of the task.

I chose the second method for two reasons:


Neural network architecture


The architecture is developed experimentally, based on the experience of the architect and good luck.

Architecture for solving the problem:


sigmoid gives at exit 4 probabilities in the range from 0 to 1, choosing the maximum, we get the side for the next step: [jumpTop, jumpRight, jumpBottom, jumpLeft].

Architecture development


Overtraining occurs when using overly complex models.

This is when the network remembers the training data and for the new data that the network has not yet seen, it will work poorly, because the network did not need to look for generalizations, since it had plenty of memory to remember.

Under-training - with not enough complex models. This is when the network had little training data to find generalizations.

Conclusion: the more layers and neurons in them, the more data is needed for training.

Playing field




Rules of the game


0 - Entering this cell, the agent is destroyed.
1..44 - Cells whose values ​​increase with each step.
The farther the agent has gone, the greater the reward he will receive.
45 - Finish. No training takes place, it is only when all agents are destroyed, and the finish is an exception that simply uses the already trained network for the next prediction from the very beginning of the maze.

Description of parameters


The agent has “antennae” in four directions from it - they play the role of environmental intelligence and are descriptions for the coordinates of the agent and the value of the cell on which it stands.

The description plays the role of predicting the next direction for the movement of the agent. That is, the agent scans in advance, that there the network learns how to move further towards the increase in the cell value and not to go beyond the limits of the permissible movement.

The purpose of the neural network: get more reward.
The purpose of training: to encourage for the right actions, the closer the agent to the task, the higher the reward will be for the neural network.

Tricks


The first attempts at training without tricks took several hours of training and the result was far from complete. Applying certain techniques, the result was achieved in just one hour and six minutes!

Agent looping


During the training, the network began to make decisions, make moves back and forth - the problem of “use”. Both moves give the network a positive reward, which stopped the process of exploring the maze and did not allow to get out of the local minimum.

The first attempt at a solution was to limit the number of agent moves, but this was not optimal, since the agent spent a lot of time looping before self-destructing. The best solution was to destroy the agent if he went to a cell with a lower value than the one on which he stood - the prohibition to go in the opposite direction.

Explore or use


To explore the paths around the current position of the agent, a simple trick was used: at every step, 5 agents would be “voluntary” researchers. The progress of these agents will be chosen randomly, and not by the neural network prediction.

Thus, we have an increased likelihood that one of the five agents will move beyond the rest and will help in training the network with the best results.

Genetic algorithm


Each era on the playing field involved 500 agents. Prediction for all agents is performed asynchronously for all agents at once, and calculations are delegated to gpu. Thus, we obtain a more efficient use of the computing power of the computer, which leads to a reduction in the time required to predict a neural network for 500 agents simultaneously.

Prediction works faster than learning, therefore the network is more likely to move further along the maze with the least amount of time and the best result.

Training on the best in a generation


Throughout the era, for 500 agents, the results of their progress through the maze are preserved. When the last agent is destroyed, the top 5 agents out of 500 are selected - who have reached the furthest maze through the maze

On the results of the best in the era, the neural network will be trained.

In this way, we will reduce the amount of memory used by not saving and training the network on agents that do not advance the network.

Completion


Not being a specialist in this field, I managed to achieve some success in learning the neural network, and it will turn out for you - dare!

Strive to learn faster computers, while we do it better.

Materials


Code Repository
Start learning in the browser
Documentation on tensorflow.js , where you can also find additional resources to learn.

Books



Thanks for attention!

Source: https://habr.com/ru/post/452612/


All Articles