Russian AI Cup 2014: winning strategy

Continuing the good tradition of “revealing the secrets” of the winners of the annual Russian AI Cup competition from Mail.Ru, I present this article to all who are interested. I will not describe the mechanics of the game world and other rules, if suddenly there are those who are interested in this article but do not know the rules, they will be able to find them on the official page of the championship.

')
This year, unlike in the past, there was a “physics” task, not a step-by-step task, which determined my heightened interest in the event. In general, I want to say right away that by physics I mean not the use of phys2d or any other physics engine, but rather a certain set of properties of the game world. The first important property is the continuity in time and space: small displacements of objects or control delay also slightly change the result. The second property is a sufficient complexity of the laws of motion of objects, so that direct modeling of the world with absolute accuracy is impossible (at least in a reasonable time). The impossibility of accurate modeling forces us to do what is being done in real physical science — to determine which phenomena are most important in the case under consideration and to build an approximate model of the system. Regarding the task of the competition, important in modeling was a fairly accurate prediction of the free movement of the puck and hockey player, as well as the rebound of the puck from the sides (not in the gate area). It would also be useful to have a prediction of the rebound from the sides of the hockey player and the puck from the goalkeeper. Just want to draw attention to the words “fairly accurate” - to achieve an absolute, bit-wise coincidence of the results of the prediction with that which gives the game engine no sense. And those people who did it were wasting time. For example, my final did not take into account the change in acceleration and speed of rotation when the endurance dropped in the process of motion prediction, only its current value was taken into account.

I also want to say about the accidents, which this year was much more than in the past. Accident adds the complexity of accurate prediction (makes it fundamentally impossible), but this is a bad, “cheat” way. This "deep" physics allows you to build more and more accurate models (at the price of computational complexity or more advanced algorithms), while randomness does not provide this depth. Yes, and the continuity of processes, it affects in a negative way. From a practical point of view, the additional variation in the results of matches only complicates testing. This year's championship will be the first in which I, in general, did not deal with the selection of optimal coefficients, because even long half-hour matches did not give a clear understanding of the strength of strategies. But, enough with digressions, I turn to the chronological description of the course of the competition.

Beta test

Most of the week of the beta test, I spent on the reverse game physics. And me, a physicist by training, it is much easier to get the required formulas using experiments, and not by decompiling the local-runner. All single contacts (shots), except for the collision of two hockey players (which I, in principle, did not deal with) and the frontal impact of the puck on the goalkeeper (in which someone's hidden rotation participated), I learned to predict with accuracy of floats. There were enough strange and tricky moments in the device of the world, for example, a rod device (section 25 units long with a tenfold reduced bounce), or an ejector wall for hockey players at the gate (and without resetting speed), or the position of a puck held by a hockey player (with speed compensation without friction ), but all these little things did not have much importance for the strategy (and even changed in the course of the championship), so I will not dwell on them.

The second important thing that I managed to do for the beta test is to calculate the range of angles at which the gate hits. I made this prediction from purely geometric considerations, without any modeling of the flight of the puck. As the first corner, I took the direction to the far corner of the goal, after that I counted the flight time of the puck in this direction. Then I found the position of the goalkeeper for this time and considered the angle at which the puck would fly by touching the goalkeeper in that position. Then he counted the flight time in a new direction and repeated the process several times. It is worth saying that an annoying bug crept into this calculation, which led to an overestimated angle. And although I, looking through the games, repeatedly noticed strange blows and suspected the presence of a bug, I didn’t find it before the final. However, this did not prevent my strategy from winning in the second round (this is to the question of sufficient accuracy).

Key algorithm

After the end of the beta test, I completed the experiments with physics and actually engaged in strategy. Unlike tanks, where I, first of all, did defense, this year I started from attack. As a result, I didn’t do a normal defense, it remained a “set of crutches”, although it worked quite well. The key to the attacking part of the strategy (and then to all the other parts of it) was the algorithm for reaching the target at a certain angle, which I will discuss in more detail.

We first consider the method of the fastest achievement of a certain point (excluding walls and other hockey players) with zero initial speed. A non-zero initial velocity only leads to a fixed displacement of the region of reachable points for each time step, without changing the configuration of the region itself. Rather tricky geometrical considerations (I will not give any considerations because of their complexity and the lack of mathematical rigor) lead to the next set of “optimal” trajectories.

Rotate left / right with forward / backward acceleration through an angle of less than 90 °, then continue driving without turning without changing acceleration.
Rotate left / right with backward / forward acceleration through an angle of less than 90 °, then reverse acceleration with continued rotation for the next 90 °, then continue driving with acceleration, but without turning.

Due to the lack of rigor, real optimal trajectories may differ slightly from these, but this is not critical (see the reasoning above for sufficient accuracy). Thus, the trajectories are parameterized by the flag of the acceleration type on the main part of the trajectory (forward / back) and the direction of the turn (left / right), as well as the end time of the turn. In case this time is more than the time of turning by 90 °, then at the beginning of the trajectory of movement there is a section with the opposite direction of acceleration with a duration of [turning off time] - [turning time by 90 °].

Similar geometric reasoning shows that if it is necessary not only to reach a point, but also to exit at a certain angle, then the “optimal” trajectory at a sufficiently large distance to the target looks like two “optimal” trajectories connected by their straight sections to the case of simply reaching the point. That is, a turn with a possible reversal at the beginning, then a movement without a turn, then a final turn with a possible reversal at the end. The trajectory parameters will be three flags - two directions of rotation and the type of acceleration in the middle, as well as two durations of rotation (not counting the total duration of the path) - at the beginning and end. According to these parameters, it is necessary to carry out optimization.

Since there are at least two parameters, a simple search (as was enough in the same tanks) will be too slow (quadratic complexity) and more efficient multi-dimensional optimization algorithms must be involved. I, as a “connoisseur” of genetic algorithms, used them, although this class of algorithms is not particularly effective and if there was time, it would be worth experimenting with others, for example, with annealing imitation. More specifically, the algorithm was as follows. To begin with, we do a search with a certain step among the trajectories of the quickest achievement of a point (without a second turn) and on these trajectories we do the calculation of points (also with a certain step) for scoring a goal with a pass and a strike with a maximum swing. Points per goal were considered as the probability to fall into the desired range of angles, taking into account the specified normal distribution upon impact, multiplied by the exponential damping coefficient over time. Next, we select several trajectories with the maximum number of points, add “mutated” variants of these trajectories (a non-zero final end appears at the same time), re-evaluate the probability of clogging, and not with any step, but with the best moment of impact found earlier, and again we select some of the best. We repeat all this several times with the ever diminishing power of "mutations."

The probability of entering the gate without a final turning is almost always close to 0, however, the normal distribution has fairly large “tails”, so at the first stage of the search, points of impact were found, and in the process of genetic optimization, the exit method was chosen. Moreover, an important role here was played by the bug with overestimation of the sector mentioned above, and at the same time with its editing before the final, artificial “tails” had to be added to the function of estimating the probability of hitting.

Another important characteristic of the algorithm was the time-wise optimization, which I only thought about in tanks, and this year I implemented it from the very beginning. In addition to the trajectories from the brute force, the optimal trajectory from the previous step is added to the common pool, with the corresponding corrected times. This adds the following property to the algorithm: it’s not scary that for the current step a not quite optimal trajectory was found, at the next step the optimization will continue from the point where it ended. As a result, this algorithm became the cornerstone of the whole strategy, the framework on which everything else was built.

First version

The first version of the strategy, which was ready a week before the first round, in addition to the main algorithm, included the following elements (in chronological order).

Prediction of possible positions of opponents, based on the code for the initial search of trajectories. The possible positions of opponents at each step, as well as the approximate center of the club sector, were entered into the map in a square grid. Sampling was done on this map, with fairly complex filtering (much simplified later), and by optimizing their own trajectories, penalties were given for being close to enemies.
The initial search was made not only for the owner of the puck, but also for his other hockey players. The positions achieved were assessed taking into account the time to achieve this and its neighbors (on the map) positions by the enemy. Several positions with the highest score became possible targets for the pass. When calculating the impact points, for the owner of the puck, not only shots towards the goal, but also towards the found targets, were moved. If, as a result of the optimization, the option with a pass was selected, the recipient began to move towards the target position.
In case the puck is in free flight, we precisely calculate its position at each moment of time. We perform full optimization for all our hockey players, only in the function of impact assessment we consider the selection of a puck, or its knocking out, if there is a threat to the goal. This part of the code also gives the correct selection of passes.
In addition to the selection and knocking for a free washer, we also consider a blow to someone else's goal, absolutely similar to the case of possession of the puck. Now it becomes possible to score goals with one strike, although only under rare circumstances.
As a crutch for protection, the following was followed to the point in front of the goalkeeper, unless the enemy had the puck and the cleverly calculated point between the goal and the owner of the puck otherwise. The following was implemented in the same cycle of the initial search, and without taking into account the speed, so the hockey player constantly flew past and, as a result, dangled back and forth near the gate. This follow-up algorithm was assigned to hockey players unoccupied in other algorithms, so there was usually one active attacker with a puck or a free-flowing one going to intercept and one defender at the gate. In case the opponent has the puck, then both hockey players tried to move to a secret point.

First round

Testing in the competition system with real opponents showed that, firstly, the strategy is too slow and we had to quickly correct the parameters of the algorithms, and, secondly, the algorithm for intercepting enemies with a puck is no good. As a result, for the second version of the strategy, I made it so that only one hockey player travels to the secret point, and the second one hangs around the point next to the goal as usual. In version 3, instead of following to a secret point, I used a simple algorithm of the QuickStartGuy type, but with a lead, and also added a check of the danger of the interception of the puck when passing or hitting the gate to the impact evaluation function. However, a serious bug crept into the new code and I had to roll back. In the 5th version, in addition to fixing bugs, I made a serious reworking of the code in order to get rid of heavy trigonometry inside loops.

The 6th version (and its corrected version - the 7th, with the sticking stuck on the backswing) - is a new attempt to rethink the methods of protection. In conditions when the existing algorithm is no good, and the universal one is not invented, I implemented a new crutch, although of a better quality. Now, for the enemy with a puck, my standard impact optimization algorithm was chased away, however, without taking into account the influence of opponents, and the trajectory of the puck found was processed with a slightly modified free-washer interception algorithm when optimizing for their hockey players. A nice bonus was the fact that the strategy has learned to knock the puck right into the enemy gates, the algorithm of blocking from the summer has proved itself here.

The 8th version playing the first part of the first round differed minimally from the 7th, but one of the changes was decisive. By removing the time attenuation when calculating the probability of the interception of the puck by the enemy, I achieved that my hockey players stopped “stupid” in front of the gates of the enemy, waiting for something incomprehensible. In the second part of the first round, I added a score of passes with a rebound from horizontal sides. For this purpose, the target point was reflected from the desired side, taking into account the rebound coefficient. The 8th and 9th versions of the strategy gave me the 6th place in the first round and, in general, showed that I was on the right track.

Second round

In the 10th version, I was engaged in adapting the strategy to rule changes in the second round and, at the same time, the final (albeit without taking into account the change in endurance during the prediction). But in the 11th version, something was added that ultimately brought me victory in the competition.

The first fundamental addition is one-touch goal clogging. Moreover, the implementation of all this is quite simple, there are no multi-dimensional optimizations for several hockey players at once. I have already mentioned that I did not chase the first stage of the trajectory optimization algorithm, the initial search, for those players who did not own the puck. Now I have gone further: for “non-core” hockey players, I launch a full optimization, as if they are the owners of the puck, and for the optimal trajectory I place the puck position at the moment of impact on the list of goals of the pass. A pass to this goal position is valued in the same way as a simple goal (and even more in future versions), so one-touch goals from a rare coincidence of circumstances became mainstream.

The second important improvement of the 11th version is the algorithm of following to the target point. Now, the calculation of the trajectory estimate does not include the current position, but the limit position of the hockey player after a full stop, plus I added orientation accounting and my defenders stopped meeting the puck backwards. Since in the second round of hockey players there were three, then, in addition to the “main” and defender, an additional player was formed, to whom I registered following the point a little higher or lower than the middle of the field (depending on the position of the enemies), from which access to enemy goal.

The final

In the 12th version, I quickly implemented the replacement logic for the final: QuickStartGuy-shaped following into the replacement area for “out of the game” and a slightly modified follow-up algorithm for replacing during the game. And then I sat down to implement various small ideas that were formed during my championship. I will list them in chronological order.

Added a penalty for transferring the puck to a place from which the enemy can knock it straight at my gate.
He added the logic of the game in overtime without goalkeepers: he corrected the assessment of the sector of impact and pushed the defender a little deeper into the goal. When updating the code for calculating the angles, I noticed the above-mentioned bug with an overestimated hit sector and corrected it.
I implemented the normal processing of the swing state: before, I simply performed a swing on the memorized number of ticks, and now I am taking a full look at the choice of the optimal moment for stopping the swing.
Implemented taking into account the probability of interception by each individual hockey player of the opponent, and not all in a crowd, as before.
In the process of reaching the target position, I began to strive to keep the orientation not at the final direction, but at a specific point. This solved the problem with a long slip of the hockey player to the target point, at which there were holes in the defense, and also removed sticking, when the defender could not decide which way to turn.
Made a more accurate, than a rough reachability map, an assessment of the sectors of interception of the puck for the first couple of dozen steps.
Added protection from one-touch goals - in the case of a free-floating puck, driving the optimization algorithm for the enemy and updating the predicted puck position. This algorithm added a negative side effect - it would seem that a completely safe trajectory of the puck was considered dangerous and my hockey players began to try to knock the puck, rather than just pick it up.

Changing the strength of the game from each of these improvements separately was not noticeable, but all together they gave a double lead over goals of the 13th version from the 12th in long matches. All these improvements in the composition of the strategy of the 13th version were played in the first part of the final, but I (and not only me) noticed a rather large number of goals in my own net due to a side effect. And since my strategy was on a par with my main competitor, I decided that even such a small number of goals could determine the winner. Therefore, in the 14th version, which played in the second part of the final, I added a penalty for trying to knock the puck into my own net. Whether this is a change, or a change in a competitor’s strategy (I’m more inclined to the second version), they allowed my strategy to break away, and I could become a two-time winner of the Russian AI Cup.

Afterword

The separation of my strategy from rivals this year turned out to be much more serious than it was in tanks. I confidently stayed in the first place of the sandbox until it was closed, which I could not do in 2012. Then, as a result of random fluctuations, I did not even get into the sandbox winners. True, I still did not get the additional prize, but not because of the weakness of the strategy, but because of the new rules.

Watch fights can be in my profile .
Strategy source code is available here .
My old article for 2012 (tanks).

Source: https://habr.com/ru/post/241553/

All Articles