The road to victory at Russian AI Cup 2012

Hello, Habravchane!
I bring to your attention the history of your participation and victory in the final of the CodeTanks 2012 programming contest.

I found out about the competition on Habré, decided to find out more, went to the project site. I was delighted to write in C ++ under Linux without dancing with a tambourine. It was immediately thought that there would be a performance gain compared with participants writing in languages like Java / Python. Well, I liked the format of the competition itself: two weeks before the first round, further down the week break between rounds. You do not need to give birth to a properly working code in a creepy tseynote, but you can think over and program everything relatively calmly. Further study of the rules and viewing of fights on the site only strengthened the decision to participate: it is much more interesting for me to program AI in a complex and poorly defined environment than in a fully formalized type of board games.
')

Brief description of the game world

6 rectangular tanks drive and shoot at a rectangular field. First, every man for himself (round 1), then three teams of two (round 2) and two teams of three tanks (final). The field is rather small, the shelter appeared only in fights three for three, so the fights turn into real meat. To move, you need to independently control the right and left tracks, and the physics is quite complicated, something like Box2D (later reverse engineering showed that its Java analogue Phys2D is used). Tanks are rather cumbersome, they can only go straight ahead anyway, they almost stop during a turn. For firing, tanks have a turret with a cannon, and the turret rotational speed is much less than the rotation speed of the whole hull with the help of tracks. The speed of the shells is also small and at long distances it is quite possible to dodge.

Tanks have two types of life points: crew health and tank armor. A tank is considered dead if either of these two values reaches zero. The damage inflicted by the projectile is determined by its type, speed, side of the tank, which was hit, and the angle at which it happened. The projectile can penetrate or not penetrate the armor, in the second case the damage goes only on the armor, and in the first case it still goes to the crew. The crew health is a very important characteristic: with low health, the engine power transmitted to the tracks and the speed of reloading the implement are twice as low. You can restore health and armor by collecting bonuses that randomly appear on the map. In addition to the bonuses for the restoration of points of life there is a third type: a box with three premium shells. Such projectiles differ from the usual impossibility of rebounding, greater damage and lower speed.

The game is played until only one player’s tanks remain on the field, or 5000 tick time is reached. Moreover, the survivors are not guaranteed victory: points are being counted for inflicting damage on the enemy, for finishing tanks, for resurrecting their own (this is if you push the killed tank onto the recovery bonus), for killing all enemies, and the places are distributed according to the points collected.

Initial strategy

After analyzing the behavior of tanks, I decided to take as a model the movement along the broken line. That is, the tank moves straight to the turning point, then makes a turn in place and continues to move straight along the next segment. Accordingly, at each moment of time it is necessary to decide whether to continue moving further along a straight line or to begin a turn. For this, I started an array of directions, where I kept the distance to the obstacle along this direction. The zero direction in the array corresponded to the forward direction, the others were evenly distributed around the circumference. Based on the distance to the obstacles, as well as the presence / absence of bonuses in this direction, I evaluated each direction and chose the best one (the forward direction received an additional bonus). Further, based on the sector in which the best direction was located, I included one of several discrete gears on the tracks.

Local tests immediately showed the inadequacy of the discrete dialing: if the tank first saw the bonus almost straight along the course, and having approached closer, had already seen it at an angle, it sharply slowed down and began to turn. I rewrote the code so that in a small sector around the forward direction the power supply varied smoothly from “full forward” (1.0 1.0) to “fastest turn” (1.0 -1.0). The tank began to drive quite quickly and collect bonuses. I also tried to implement the simplest system of evading the attack vectors of enemy tanks, but did not take off. Having screwed the target selection system on the basis of the shortest time for aiming the gun, taking into account the body rotation and the simplest anticipation, I sent the first version of the strategy to the site.

Real fights forced to add movement backwards along with moving forward and taking into account the distance to enemy tanks when choosing a target. I also changed the code for calculating the lead time, adding the exact formula for the projectile flight time, but introduced a bug, which, as a result, completely zeroed the lead. Nevertheless, with this strategy, I got to the 200th place in the rating, and proceeded to implement a more complex algorithm, which I conceived almost from the very beginning of the competition.

Danger field

The main idea of the new algorithm was to build a “danger field” and move along it in order to minimize this very “danger”. The degree of danger was calculated on a regular grid with square cells (the size of the cells at different times varied from 10 to 16 units), taking into account the distance to enemy tanks, their visibility and the direction of the gun. The danger of being on the path of a flying projectile also seriously added. Specific formulas have changed many times, the coefficients were adjusted in them, so I will not give them here. In order for the tank to go around obstacles, the value of the degree of danger inside solid objects was set to a sufficiently large number. The resulting field was blurred by a circular filter with a radius of about 3.5 cells, which roughly corresponded to the size of my own tank.

To implement this functionality, we had to seriously address the geometry. Before that, I only had a vector class copied from another project and corrected. Now I have implemented the class of an arbitrarily rotated rectangle and the function of determining the overlap of two such rectangles. To check the visibility from a certain point of the enemy tank, I built a thin rectangle from this point to the location of the tank with a width of projectile diameter and checked its intersection with other objects.

Unlike many other participants in the competition, I was too lazy to fasten the graphical interface to my debugging strategy, so I spent the evening learning the capabilities of the regular Linux terminal. It found commands for specifying colors on a fixed 256-color palette (xterm-256color), in which there was a black and white gradient in 25 gradations. As a result, the ability to display a picture with the values of the hazard field in the terminal has become the chip of my strategy. At first, two spaces with the corresponding background color value corresponded to one cell, then I replaced the spaces with the U + 2584 () symbol and doubled the resolution of the image.

Movement through the field of danger

The resulting hazard field was used to select the direction of motion. The movement model remained the same - along the broken line, only the first segment was used to select the optimal direction. That is, I believed that the tank first turns into a certain angle on the spot, goes straight ahead at a constant speed, and stops, having traveled a certain distance. In the first versions of the algorithm, there was a separate branch in case the tank was already traveling fast, but it did not justify itself. Using the chosen motion model, I carried out the integration of the hazard field with the weight linearly decreasing with time:

Where

- field of danger, and

- position of the tank at the moment of time t , according to the movement model. The integration was carried out numerically, the time step was chosen such that the corresponding space step was in the area of the hazard field grid step. The value of the hazard field at arbitrary points was considered using bilinear filtering.

In the program, I chose the direction of motion, calculated the time required for rotation (in the constant speed approximation) and wrote down the field value at the starting point multiplied by the coefficient corresponding to the rotation time to the current integral value. Further, moving along the chosen direction, I updated the current value of the integral and considered the full integral for the case if we stopped at the last point considered. If a bonus fell on the way, I took some value from the current value of the hazard integral. Having found the optimal distance along one direction, I proceeded to the next. Driving in the chosen direction was organized similarly to previous strategies.

At first, I tried to realize the collection of bonuses by lowering the value of the hazard field in their position. This led to the fact that the tank began to strive to spend as much time as possible near the bonuses instead of eating them quickly and continuing to move. I had to organize bonuses in an array, sort by distance from the starting position of the tank and take into account the achievement of each bonus only once.

Although the tank became much smarter, learned to hide behind obstacles, it did not greatly affect the rating of the strategy. I made a slight decrease in danger in areas where at least one enemy is visible, so that my tank would not be ambushed at the end of the game. The tops of that time carried him out almost immediately: he traveled to the center of the field and received several hits at once, respectively, tried to increase the danger value in the center. Changed the weighting function when integrating danger: replaced the linear decay by the exponent. I found a bug in calculating the time of flight of the projectile - I fixed the lead.

The change of weight function to exponential was planned from the very beginning, but was postponed due to the complexity of implementation: there is no maximum time after which the weight is reset. It was necessary to replace the cycle stopping condition with a comparison of the lower estimate of the hazard integral with its minimum value found by the current moment.

Physically accurate movement and avoidance

After all this, I paid attention to the problems with the movement: sometimes the tank missed the bonus and began to ride around it. Since the calculation of the hazard integral was almost independent of the motion model, I decided to use a physicist close to reality. After experimenting with the local-runner, I got all the necessary constants (knowledge of the design of physical engines and the ability to determine the exponent by eye on the schedule helped greatly). Replaced a set of directions of motion to a set of pairs of power supplied to the tracks. In the algorithm for calculating the integral of danger, he took the field value in the coordinates corresponding to the predicted position of the tank. Although there was an instantaneous stop at the end point from previous versions, it did not affect the efficiency of the algorithm. The new motion algorithm has become a real breakthrough. I quickly moved to the 20-30th place in the ranking. Even then, it became clear that the key to victory is the exact simulation of physics.

Although the strategy has become much stronger, against the tops of this was not enough. I tried to reduce the danger value in the corners, adjust various factors, removed the instantaneous stop in the calculation of the hazard integral, and tried to increase the recursion depth: simulate more than one pair of tracks per track, and first apply the first, and after a while - the second, but it was not enough CPU time.

Then I decided to improve the dodging of shells. Prior to this, the projectile trajectory made a great contribution to the danger field and the tank tried to drive away. But the resolution of the field is quite low, and there is little time to dodge, because this method is ineffective. But I already had exactly the predicted position of the tank at future points in time (as well as the function of determining the overlap of the rectangles), so I could know for sure if the projectile would fall into my tank and, if so, at what angle. By adding the appropriate values to the hazard integral for specific trajectories, I could choose one on which there would be a minimum number of hits and they were more likely to be rebounded. This was the second breakthrough, again connected with exact physics. With this strategy, I climbed up to 10-20th places in the sandbox and took 3rd place in the first round.

And just to determine the hit / miss of the projectile in the tank was not enough. Serious win gave account of the angle at which the shells hit. Thanks to him, I began to reduce most of the hits at medium distances to ricochets. At the same time, I tried to take into account collisions with the walls in order to improve the dodge from the projectiles while being near the field boundaries, but I got absurd from the physical point of view results for the impulse transmitted at impact and postponed the implementation of this improvement.

Selection of constants

A week before the second round, I spent on refactoring and code optimization, on experiments with coefficients and on a system for synchronizing the firing of several tanks. The synchronization system did not justify itself, and therefore subsequently I gradually reduced the synchronicity coefficient until I turned it off completely. Also made an account of the orientation of the enemy tank when choosing a target, which brought me to first place in the sandbox. However, the second round showed that my strategy is relatively weak in a team battle.

In order to improve the situation, I modified the field of danger so that the tanks tried to keep close to each other, but not very close, and also decided to seriously address the problem of choosing the optimal constants in the algorithms. To do this, I implemented a system to automatically launch strategies with an arbitrary set of constants (thanks to ud1 for local-runner and Run.class reverse engineering for running strategies in any combination) and organized a genetic algorithm: there is a pool of strategies with different constants, they play paired Each of the battles with each one and by the results is discarded the worst and is copied with the mutations the best. It was based on the results of such optimization that my strategy began to aggressively attack the enemy. Also, the collective mind dealt with the physics of hitting the walls: in the engine Phys2D there was an error and the absurd expression I received for the transmitted impulse actually corresponded to reality. As a result, my tank became more intelligent to behave near the boundaries of the field.

Shooting with exact physics

But all this was not enough, but before the final there was less time left. And I decided to drastically improve the shooting system: to take into account the exact physics of enemy tanks when choosing the direction to the target. Since going through all the options is not enough resources, I decided to limit myself to cases when the tank delivers the same power to both tracks. These cases include full forward and full back, defining the main range of possible positions of the tank in the future. Since in these cases the angular momentum of the forces is absent, all the angular values are common, and only the position and speed of the tank differ. Moreover, the position can be set with two vectors, for example, the position for the “full forward” case and the “full back” position (I have a position at zero power on the tracks and a vector of the “full forward” position offset from zero). All other positions can be obtained by linear interpolation.

Knowing the range of possible positions of the tank at some point in time, as well as the position of the projectile at the same moment, I find the subrange in which the hit occurs, assign a certain amount of points to this subrange depending on the time and angle of hit, and proceed to the next point in time. Integrating over the full range of possible positions, I find the total number of points corresponding to this projectile. Looking through the possible directions of the shot and taking into account the time of the turn of the tower, I find the optimal and, if possible, shoot. This was the third breakthrough, again connected with exact physics. The updated strategy won the first place in the first part of the final, and during the break I improved it by correcting the intersection bug, taking into account points from projectiles already flying in order to block the widest possible range of positions, as well as dropping those positions in which the enemy tank turns out to be inside the obstacles.

To implement all of my plans, I again had to go into geometry, the previously written intersection of the rectangles was not enough. Now I found a range of positions when moving an arbitrarily oriented rectangle along a straight line at which this rectangle overlaps another stationary rectangle. Also separately implemented a special case when a moving rectangle has zero dimensions, that is, is a point.

Integrating across a range of possible positions was not exactly trivial. So that the tank would prefer to overlap the unclosed sectors, rather than adding the density of fire to where there is already an impact, I increased the effect on the integral of low points and weakened the influence of large with a power conversion:

I made the search for the best direction of the shot not by a simple search, but by an algorithm similar to the method of dividing in half. Having checked several equally spaced directions around the beam to an enemy tank, I chose the best one, reduced the angular pitch twice, and checked the directions around the one found. Repeating this procedure recursively several times and additionally checking the current direction of the gun, I switched to another enemy tank.

Unrealized ideas

Perhaps the most important improvement I missed is the automatic orientation of the tank sideways to the enemies. Due to the lack of this system, my tanks were relatively weak in long-range combat and I had to arrange a rush in which the risks were much greater. My physical education prompted me to replace the scalar hazard field with a quadrupole one, that is, to store five circular harmonics at each point. But this would slow down more than five times the sampling functions of the hazard field, and my strategies were already executed at the limit of the available processor time.

Another unrealized opportunity is to account for the collision of shells in the air. I was stopped here by the impossibility of confining myself to simply avoiding other projectiles when firing. A sufficiently large percentage of enemy shells get off their own, and if you add avoidance, you will have to write code to fire at dangerous enemy shells.

Watch records of fights in my profile on the competition website.
The source code of the final version of the strategy can be found here .
Constant optimizer code is here .

Source: https://habr.com/ru/post/161333/

All Articles