Evolution of agents managed by a neural network

Let's look at the environment: there may be particles of "food" and agents in it. Using sensors, agents can obtain information about the environment. If the agent is close enough to the food particle, then it is considered “eaten” and disappears, and at that very moment a new food particle appears in a random place of the environment. The task of a group of agents is to collect food. Efficiency is considered based on the total amount of food collected.

Let's simulate a competitive environment to automatically search for the optimal behavior of a group of agents. The algorithm of the behavior of agents will be constructed in the form of a neural network.

To simulate a competitive environment, you can use a genetic algorithm and conduct tournaments: all agents are assigned a neural network of a certain configuration (“brain”) and after a certain period of time the number of food particles eaten is recorded. Those neural networks, under whose management agents were able to collect more food, win — based on them a new population of neural networks, etc., are winning.

Of course, there are specialized algorithms for learning neural networks, but I used exactly the genetic algorithm, because:

I did not want to be limited to a specific topology of the neural network (if more objectively, this is due to the lack of theoretical assumptions as to which network topology and the number of neurons should be used in this case)
I wanted to check the possibility of using a genetic algorithm for training a neural network
reused the implementation of the genetic algorithm that he created to solve another problem

Let's look at the details of our model.

The medium is characterized by the size (width and height), as well as the number of agents and food particles. The environment is closed - agents cannot go beyond it.
Food particles are characterized by coordinates (x, y).
Each agent is characterized by a position (x, y) and a direction vector.

Sensors of agents receive from the environment the following indicators: a signal about the presence of food nearby, the distance to the food particle, the cosine of the angle between the direction vector of the agent and the vector directed at the food, the signal of the presence of other agents nearby (and accordingly the distance and cosine of the angle between the direction of our agent and vector directed to another agent). The ability of an agent to obtain information about the environment is limited to the "scope" - roughly speaking, an agent can only see ahead of himself.

The agent interacts with the environment by changing its own position and direction. That is, the sensors are fed to the input of the neural network, and the output reads the value of the angle by which to rotate, as well as the value of the value by which the agent’s velocity should be changed.

Genetic operations on the neural network

For convenience, the neural network can be transformed into a linear form. To do this, it is enough to write all the parameters of the neural network into a one-dimensional array. The parameters include the weights of connections, as well as the type and parameters of the transition functions of neurons. The operation of crossing and mutation of linear chromosomes is quite simple.

Model in action

We will observe agents in an environment with dimensions of 600 x 400, but to speed up calculations, tournament competitions are held in an environment with more compact parameters: medium sizes are 200 x 200, the number of agents is 10, the number of food particles is 5.

The neural network of optimal configuration is obtained quite quickly. In the video, you can observe how the behavior of agents changes as a result of the tournament selections (if you look at YouTube, the image will look a little clearer):

Interesting observations

The environment can be complicated by allowing food particles to move. Interestingly, the neural networks obtained for the environment with static food particles show good results in an environment where food can move (at the end of the video I demonstrated moving food particles).
The environment in which agents operate is partially observable (agents can only see ahead of themselves, and cannot see the position of all the objects of the environment at the same time).
Also, the environment is non-deterministic (food particles may appear in random places and move in different directions).

But as a result of “evolution”, such configurations of neural networks are quite often formed, which, while moving toward the goal, “force” the agent to “turn around” (this can also be seen in the video). This behavior is effective because there is always the possibility that as the agent approaches the target, some other agent may eat this bit of food, or a new piece of food will appear on the way to the goal, which can be eaten faster.

If to generalize, those neural networks that help to cope with environmental constraints turn out to be more efficient.
Generally speaking, observing the behavior of agents, sometimes it seems that they behave quite “naturally” - like a group of some bugs or fish.

I implemented the ability to save the configuration of a neural network to a file, and the following video is a demonstration of various behavioral strategies that I have observed:

If you want to experiment

You must have Java runtime installed
The emulator can be downloaded from here.
Emulator emulation:
java -jar simulator.jar

Here are the neural network configuration files that I used in the demonstration videos.

The emulator is equipped with a simple graphical interface. By pressing the left mouse button, you can add food, and by pressing the right button - new agents.

Technical details

The architecture of the main components of the application is as follows:

The genetic algorithm is performed in a separate stream, and at the end of it, all agents are established with a new neural network configuration, which entails a change in the behavior of agents on the screen. The neural network configuration can be saved - the xml format is quite convenient for this.

Instead of conclusion

The use of a genetic algorithm to train a small neural network has been crowned with success.

Working on such a small task you get a lot of pleasure: it is very interesting to observe the evolution and collective behavior of artificial creatures.