The history of victory at the annual competition Russian AI Cup 2015

I want to tell about my participation and victory in the annual competition on programming AI "Russian AI Cup 2015" from Mail.Ru Group. You can see the detailed rules of the competition and game entries on the russianaicup.ru competition website.

This year the competition was organized at a completely new level. Changes have occurred both in the scale of the game world in which the AI operates, and on the competition website. Thanks to the three-dimensional visualization, the games looked much more exciting. By entertainment, in my opinion, the competition significantly surpassed last year's hockey, and the "soldiers" of 2013.
')
The participant was asked to write an AI for driving in the race for survival. Like last year, the task was “with physics”. But this time the source of the "physics engine" was open. Yet, unlike last year, this time all random phenomena in the game world were vivid - random map, randomly placed bonuses. It was immediately obvious when luck was on your side, and when she turned away from you. In last year's hockey, even while watching the game with significantly different opponents, it was difficult to understand what was gained by chance or skill. I think this had a positive effect on the entertainment of the competition.

Brief description of the rules

The goal - to drive 2 laps on a closed highway the fastest. More precisely, you need to score the most points, but to come first is the main way to earn points. More points are given for collecting bonuses on the road and causing damage to opponents. The track, as a constructor, is assembled from square tiles, these are straight sections of the course, corners (turning the track 90 degrees), or intersections (T-shaped and regular). You have to drive along the key points ("tiles") of the route in a certain order - sometimes you have to make loops, sometimes you even have to go backwards. The machines also have the opportunity to pour fuel oil puddles behind them, to fire at each other with special projectiles (tires and washers), and to use a special “nitro” accelerator. Charges for all these devices are limited, and are replenished by picking up bonuses randomly scattered on the map.

I'll tell you how my AI works, and thanks to which (I think) managed to win.

First version

I participate in the competition not for the first time, every year I managed to achieve better results than in the previous one. From my experience, a very large part of the success is taken by properly organized development of your strategy. Since time is very limited, you need to think from the very beginning - what and in what order you will be doing. If you have to throw everything away and write again, there is little chance of winning.
I do not think that in my strategy there were any kind of “killer feature”, rather, many small successful solutions have developed into one big advantage. The story will be quite long with a detailed description of all these moments.

Regarding testing and debugging. In past years, it happened that I beat my head against the wall trying to understand why my algorithm does not provide the benefits I expected. And after spending a lot of time, I suddenly found a bug in the very heart of my algorithm - and everything flew up even higher than I expected. Even this year, I almost missed the victory because of bugs, I had to spend more time on the tests. This time, visualization has become for me the main way of debugging. I failed to come up with short but effective tests, and it was a pity to spend a lot of time writing long tests, which I would have to constantly rewrite. There was only a test for a physical simulator that compared the predicted trajectory and the real one. Without it, debugging the physics of collisions would be completely impossible.

The first version I decided to teach just drive through the track. But even this minimum, if it was implemented honestly, required a rather large amount of work, we needed: a physical simulator of free movement of the machine, collision detection with the sides, search for a path and a trajectory planner. The first three components were clear - how to do, the fourth is not very.

I started, as in the past, with writing a physical simulator. There were no problems - the source code of the physics engine was available. It immediately became clear that if you do 10 iterations for each tick as in the engine, then there may be problems with performance. Recalling last year’s winner’s article, I had doubts whether absolute accuracy was necessary. I decided to do everything exactly, and then, if necessary, redo it by 1 iteration instead of 10. It was the right choice, as it turned out. Then I began to detect collisions with fences. I was afraid to do collision detection in the engine, I decided that it would be too slow. Having considered several options, I stopped at this algorithm:

Each "tile" is divided into a square grid of 9 sectors. Being in each sector, the machine can encounter only with a certain type of fence located on one of 4 sides and, for some types, with a mirrored fence.

These types of fencing are:

Straight wall right / bottom / left / top
Inner corner
Outer corner
Circle
The wall, turning into a circle with one hand for the redistribution of the sector. There are 4 sides here and for each a mirror image
The circumference that passes into the wall on one side outside the sector. Here, too, 4 sides and mirror image

These 6 options cover all possible cases. Variants 5 and 6 are needed, since the machine can be in one “tile”, and the corner can collide with a fence in another “tile”. Next, a table was built for the entire map, where in each of the 9 sectors for each "tile" there was an object that could verify if the machine had collided with its fence. The implementation turned out to be quite effective, consuming less than a third of the time spent on the iteration. This made it possible to do a collision check at each iteration, rather than once every tick, without a significant loss in performance.

I'll tell you about another optimization. Since the angular velocity was involved in the motion, the first implementation calculated the sines and cosines at each iteration — this was slow. But since the angle of rotation of the wheels (affecting the angular velocity) changed only once in a tick, and the angular velocity of the collision rarely appeared, it was usually necessary at each iteration to simply add the same constant angle to the orientation angle of the machine. I kept the orientation angle of the machine with a vector and rotated it by a constant value at each iteration - these are 4 multiplications and 2 additions. Sines and cosines should be counted only once every tick and, in rare cases, after collisions at each iteration.

The search for the path made the simplest way - the wave algorithm. Then I wanted to rewrite Dijkstra’s algorithm in order to somehow take into account the presence of turns, bonuses, puddles of fuel oil. But never returned to this.

There remained the last component - the trajectory planner. I had to tinker with him, I think, in the total score for the entire competition, half the time was spent on him. The second half of the physical simulator, the rest was spent a minimum of time. At first I tried to solve the problem in the same way as before - sorting the trajectories of a certain format. The simplest format: drive N ticks straight, then M ticks turn the steering wheel in any direction, then return the steering wheel to the neutral position. It worked, but poorly and very slowly, the miscalculation of all the options for a hundred ticks ahead did not fit into the time limits. The machine did not understand that she needed to reorganize into the next row before the turn. Having spent a couple of days with this idea, I realized that I needed something else.

I decided to build a tree of trajectories, I think, many participants came to this idea. But it is probably of interest how I built it. I think I managed to win, thanks in large part to the successful algorithm for building this tree. The algorithm came up with himself, not based on something known. Do not judge strictly if you invented the bike. The tree was built in this way:

Construct several initial trajectories with a length of 450 ticks (the length here and further is measured in ticks, it did not immediately stop at 450). Further, many times (until the allotted time is over) the recursive method was called, which in the middle of any of the branches of the tree produced a “branching”. When “branching”, the trajectory section to the branch point became the trunk of the new subtree, the section after - one of the branches of the new subtree, and also new branches were added. The length of all branches was chosen so that the total length of the path from the root to any final state was constant or less and the branch ends in collision. In the first version, 4 variants of movement were used for “branching”. The gas is always on the floor, and the steering wheel is twisted in 4 positions: right, left, neutral, we keep it in the current position. Later options were added with braking and riding backwards. For this tree building algorithm, 3 evaluation functions were used:

Make a “branching” of the trunk of the current tree (always in the middle) if the average length of the branches of descendants is less than the length of the trunk. Otherwise, go to the selection of the child for a recursive call on it. I experimented a lot here - this option turned out to be the best.
Select the descendant of the current tree to recursively call on it. The child with the maximum value is selected: <average length of branches in the tree> * <third evaluation function> / sqrt (the sum of the lengths of all the branches of the tree). The division is necessary so that the tree does not degenerate: the more time is spent on the study of the branch, the greater the denominator and the lower the priority.
The main evaluation function, on the basis of which it was decided on which branch, we still go (which branch is better). In the first version, the value of this function was equal to the sum of the estimate for the trunk and the maximum estimate of the descendant. The reward was given for approaching the finish line - “taking the tile”. Later this function has changed a lot.

It turned out a fairly evenly branching tree with the priority of branching of those branches that led to the finish line. Unfortunately, I didn’t have pictures of that tree, but it hardly visually differed from the latest versions.

Here is the visualization of the latest version of my strategy on one of the maps in conditions of limited visibility. These are trajectories that are built in one tick. The best one is chosen from the set of such trees obtained for many ticks (about this and other techniques - in the article below).

Here I want to thank the organizers. This time, the world in which AI operated turned out to be much more complicated than in the past, and this prompted the participants to look for new algorithms and approaches. Such a tree of trajectories, I think, would work fine in last year’s hockey and in tanks.

Approximately 9 days before the first round, the first version was ready. She had already traveled normally and climbed to about 200x-300x seats. Then it became the frame of the strategy and did not fundamentally change, only new functionality was added.

Preparing for the first round

Although at that time there were still few maps, and they were simple, the machine was often stuck. Resting against the wall, because she did not know how to ride her ass. Switching to “smart” in case of failure of the main algorithm was ineffective. I did not write an emergency heuristic algorithm, since I planned to teach everything to the main one and did not want to waste time in vain. Added reversing the branching options for the root of the tree paths. At this stage, I have every tick built a tree of trajectories again. Adding branching options to all branches greatly reduced the branching of ordinary branches (resources are limited), the machine started to get worse. So I tried to add new branches only to the root and only if absolutely necessary (stuck). I learned to leave simple situations - there were fewer defeats. Rating crawled up, got to the second hundred.

Here I want to note that I tried to do all the improvements in such a way as to first of all solve the problem, the cause of the greatest number of defeats. This approach justified itself, since some problems did not need to be corrected - which saved time.

The next improvement was very simple, but very effective. Why throw out every tick trajectory found and look for it again on the next tick? Since the physical simulator gave an accuracy of up to 10-12 decimal places, the machine remained on the chosen trajectory with very high accuracy for dozens of ticks. Then I realized that perfectly accurate prediction of the trajectory would save me a lot of computational resources. Each tick of a new built tree was compared with the one saved on the previous tick and the decision was made to leave the old one or replace it with a new one. He believed that the machine "descended" from the trajectory, if the predicted and real coordinates or speed differed in the 7th sign. Since the tree was not symmetrical, but elongated towards the target, and the length of each branch was up to 10 or more ticks, the found tree “became outdated” rather slowly.

Thus, the trajectory was selected from a whole forest of trees built in dozens of ticks. If I followed the advice that absolute accuracy is not needed - this technique might be difficult to apply, it would be necessary to control the accumulated error. Ten iterations for each tick reduced the speed of trajectory calculation by 10 times (in fact, much less) but made it possible to calculate it ten times longer. Yes, and tricks like turning a blow to the tire is difficult to do without absolute accuracy (this is the question of sufficient accuracy).

After the previous optimization, problems with performance retreated: even reducing the tree building time by 3-4 times, I almost did not weaken the strategy. Another revision allowed to use the brake. There were problems: a lot of almost indistinguishable from each other states with braking already almost stopped the machine spent all the allotted time. I decided very roughly, but it worked: I forbade to slow down, if the longitudinal speed of the machine is less than the given one. Constant defeats on the cards with sharp turns stopped - the rating was fixed in the first hundred.

Then I added the use of nitro. Just a few more branching options, but for root only. Nitro was allowed to use if the length of the path found with activated nitro exceeded 7 "tiles". I couldn’t plan to use nitro after some time, I thought it wasn’t necessary and didn’t do to save resources. Bug rules and added a few "crutches" that would somehow hot shoot, put puddles.

The next thing I decided to implement is the collection of bonuses. By that time, few people collected them and, arriving last, it was easy to take second place in the race. I was too lazy to do the physics of collision with a bonus and did not do it until the end of the competition. So it was not difficult to add bonus collection, every tick in the physical simulator checked the intersection with the bonus. For taking the bonus reward in the 3rd evaluation function. So far, he gave a big reward for points, the average for "nitro" and for "repair kit", by zero for the rest (I still could not use it anyway). This version participated in the first round and took the 14th place, and after the round it consolidated in the "top20".

Preparing for the second round

At the weekend, while the round was going on, there was a time for a little rest and thinking again about the general approach. In addition to the fact that I did not know how to shoot and pour out puddles, oddities were observed in picking up bonuses, and the machine became worse off at high-speed sections.

I decided to start with the second problem. Adding braking to the variants of “branching” the tree of trajectories, even with restrictions, still had a negative effect on driving in high-speed sections, as the tree became less branchy. Having spent quite a lot of time experimenting with the evaluation functions of a tree, I did not achieve anything. And then he decided that it was possible to build both trees (with and without braking) and choose the best. It was a great idea, it turned out a composite algorithm, which I have expanded in the future, adding a tree with collision resolution to the composition. However, I had to reduce the number of iterations per tree at times, here the performance margin was very useful. Now the algorithm combines the advantages of both versions. In the above trajectory tree image, you can see three green lines — these are the best trajectories in each of three different trees. The black line is the stagnation section on one of these three trajectories.

The problem with bonuses was more fundamental. Studying the behavior of the machine, it was clear that she was driving past some very simple bonuses. The reason was that there were 2 bonuses ahead (closer and farther away). The strategy did not have time to calculate the trajectory on which it could take both bonuses and chose the second one, since it kept the track a little better. Although taking a neighbor, she would easily have found a way to take and distant. In order to motivate taking first bonuses in the first place, the exponential decay of the reward for any events was added to the third evaluation function. The best result was given by a rather slow attenuation of 0.99 per tick (2 times in about 70 ticks). I think this technique had a positive impact on the typewriter ride and on the shooting in the future, although it is certainly difficult to say.

Further analysis of the games showed that there are many defeats due to the hitting of fuel oil puddles. Added a penalty for hitting a puddle of fuel oil. Then he decided that this was not enough, he was very much afraid of quite innocuous puddles. Added to the simulator a drop in friction coefficient on contact with a puddle. I dig into “locale raner”, I found the formula - how the angular velocity changes at the first contact with a puddle of fuel oil. It turned out to be a “random” there, giving only 2 options: we twist right or left, and the speed of twisting can be calculated exactly. I made a “probabilistic” branch in my tree of trajectories, which has these 2 variants in its descendants and averaging over them. Puddles began to go around perfectly, and sometimes even to use in their favor.

The second round was approaching, it was time to master the tires. I did nothing fundamentally new here. Added to the physical tire simulator, their collisions with the sides and with the machine. It took a lot of time, I had to deal with the physics of collisions in the engine, to test everything. For taking damage, he gave a penalty in the evaluation function, and for death a very large penalty. After that, I already knew how to dodge tires and even use a collision with them to my advantage, but I still could not shoot. Now adding shooting has become quite a trivial task. It was necessary to imagine that I was an opponent and try to dodge my tire. If this did not work or did it but at the cost of plunging into the wall, then it is necessary to shoot. I substituted the enemy in my algorithm for finding the best trajectory and if the overall assessment of the trajectory (taking into account everything: movement, damage, bonuses) was greatly reduced during my shot, then I shoot.

For the enemy, trajectories were calculated for 150 ticks, and time was allocated 10–20 times less than the search for its own trajectory. The tricks with a turn on their own tire turned out by themselves - the strategy simulated a shot at the enemy and saw that its own trajectory, which was recalculated taking into account its flying tire, does not get any worse, and allowed the shot. The setting of puddles was added in a similar way, I don’t remember before or after the shooting. There were a lot of mistakes due to the small amount of calculations, but I finished the second round in the first place with a large margin. While there was a second round, I taught to shoot the buggy. I did it just like shooting tires. At first I learned to shy away from pucks, then I added shooting.

Final preparation

It's time to prepare for the features of the final. The organizers promised unknown maps and limited visibility. What would be decisive in such conditions was not clear. I had problems with driving in conditions of limited visibility, and decided to do this.

Likely, as well as all participants of the ending, I simply considered unknown "tiles" as crossroads. Refined visualization - painted "fog of war", all the "tiles" I know. Instead of unknowns I drew crossroads, now I saw the world through the eyes of my typewriter. The first viewing of the check revealed a problem. The machine found the best trajectory through unknown “tiles” that it considered to be intersections, but didn’t throw it off after their disclosure, although now it went straight to the wall. I wrote a small algorithm that “cropped” the saved trajectory tree every tick as new information was received. If the branch passed through an unknown "tile", and after its "disclosure" it turned out to be not of the type that was supposed to be - this branch was cut off from the tree. It was also necessary to do the processing of other events (the appearance of puddles, shells, bonuses), but did not have time, the whole tree was dropped for them.

There was another problem that visualization revealed. Some "tiles" were considered "passable" (not empty), although it was clear from the neighboring "tiles" that this was not the case. A slightly modified wave algorithm (re-passing through the "tiles" for which new data on the presence of walls was found) corrected this situation. Now entire areas of the map could “collapse” abruptly into a large empty area, as soon as it became clear from indirect data that there was no way there. I don’t know if this gave any increase in the strength of the game, but it became much more pleasant to look at the visualization.

What else greatly weakened the strategy in conditions of limited visibility? Unexpected turns created a lot of problems, fearing to touch the wall, the strategy was losing a lot of time. Either it was very slow when cornering, or, having slightly touched the wall, even at a slight angle, it went into reverse. I finally decided to tackle the resolution of the collisions of a typewriter with walls. Here, too, there is nothing special to tell - a lot of time for copying logic and testing, attempts to make all this not very slow. There were fundamental problems. If before that the trajectory was broken, touching the wall, now there are long (in ticks) pushing trajectories near the wall. The machine practically did not move, resting on the wall, and the tree of trajectories again degenerated on these almost indistinguishable states.

I had to return to the old idea, with which I experimented a lot, trying to solve a similar problem when braking. Its meaning was to replace the "length in ticks" with the real length. Although it spoiled the usual trajectories, for collision paths this is the best I could think of. Added this option to drive into your “composite” trajectory search algorithm. This strengthened almost all aspects of the strategy. , ( , « » ). , , , . ( ), .

, . , . : , , «» . . , , . , , . , , , .

. , , , , . , . «», «». — , , . , « », – , . , « » . - «» -, . , «» ( , ) . , . — , - , . «» «»? , « » « » , , ( ), .

, , . «», , . , — . — , ( , , ), - . , .

Conclusion

- . . , , .

.

Source: https://habr.com/ru/post/273649/

All Articles