History 30 places in the final Russian AI Cup 2015

Introduction

This year, in the Russian AI Cup, it was necessary to program the bot to control the machine (and even two in the finals!), And I decided to participate for the first time, because the topic seemed close and dear: a fan of racing, I go to karting from time to time (though without outstanding results), I often spend evenings riding virtual Hawaii in Test Drive Unlimited or Nurburgring in GTR Evolution.

I didn’t take a high place in the end (30th place in the final, finished the sandbox 48th), but the short time between the second round and the final was in the top-10 sandbox. Also, judging by the competition forum, a set of ~~crutches for~~ solutions like no one else has used in me, so if you want to know the details, welcome to the cat. The most interesting part in the post, perhaps, is about finding the optimal trajectory.

A detailed description of the competition can be read on the site , but the essence can be summarized as follows:

There is a map broken into tiles of different types (straight, turn, T-shaped intersection, intersection, empty tile).
There are 4 cars. One of them (in the final - two of them) is controlled by the player. Each server tick is transmitted to the bot-strategy that controls the machine, information about the world is transmitted, and the strategy must answer what actions it wants to perform. There are not so many possible actions: change engine power, turn the steering wheel, fire shells, pour out a puddle of oil, turn on nitro, press a brake
To complete the race, you must drive through all control points-tiles in a certain order (in fact, drive two laps of the route). The coordinates of the points are known, each one builds the path independently. In the first two rounds of the competition, the whole map is known, in the final the player sees only a small neighborhood near his cars.
Bonuses are randomly scattered on the track: if you hit them, depending on the type of bonus, you can replenish the number of shells for firing, the amount of oil to create difficulties for rivals, charges of nitro acceleration, repair the machine, or get an additional 100 points.
The player gets points for passing checkpoints, for damaging opponents, for picking up bonus points, and also gets an additional 512/256/128/64 points for finishing in 1/2/3/4 place, respectively.
After the end of the game, participants are ordered by the number of points. The one who scored the most is the winner, everything is simple. Because of the bonuses and points for damage to rivals, the one who came to the finish not always won before: for example, it is enough to pick just one bonus for 100 points, and already overtake the player who finished third.

Decision making strategy can be divided into several relatively independent parts:
')

Finding the way
Calculating the optimal trajectory for the selected path
Machine control to follow the trajectory
Shooting
Oil use
Using nitro acceleration

Finding the way

Initially, the usual wave algorithm was used to find the path: we launch a wave by the tiles from each control point, cover the entire map, and now the shortest path from each map tile-point to each control point is known. Unfortunately, everything is not as simple as we would like: the shortest path may not be too successful for one of the following reasons:

Contains sharp turns or too many normal turns.
Does not pass through the tiles with bonuses
Requires reversal
The path to this control point is optimal, but when you follow the next control point, you get a path (see above)

As a result of reflection, it was decided to try a method that scared me with its essence, but it turned out to be quite effective: complete bust.

In the beginning, (almost) all possible paths are built (sequences of successive tiles starting from the current one), with a total length of N, where N had to be chosen equal to 11 in order to meet the strategy time limits. I’ll clarify what “almost” is: the paths are built on the assumption that there is no need to be in the same tile twice if no control point has been passed from the previous visit. Thus, if at the stage of constructing the next possible path it turns out that the tile is already in the path, then the construction of the current path stops and the construction of the next one begins.

After receiving a list of (almost) all possible paths, each of them is evaluated, as far as it is convenient. When evaluating a large number of constants are used, selected at the level of common sense and the method "by eye". The path gets points according to the following criteria:

For every two turns in one direction in a row to impose a penalty of 30 points. Turns 180 degrees are penalized, because the machine has a very large turning radius and it is difficult to fit into a sharp turn, especially if the opponents actively “help”.
For every two consecutive tiles that need to go straight, add 10 points. It's simple: going straight is easier than turning.
For every two alternating types of turns (turn to the left, and immediately after him turn to the right, or vice versa), add 12 points (movement along the diagonals of the tiles happens quickly, therefore such ways are quite profitable).
If you need to turn around, impose a penalty of 150 points.
If there is a bonus on the expected path of moving through the tiles, add the number of points corresponding to the type of bonus. If the bonus is 100 points, then add 55 points; first-aid kit (with the machine having less than 50% of health points) - 25, charge of nitro - 35, oil - 15, projectiles - 15.
Add 10 * cos (a) points to the path, where a is the angle between the direction of the vehicle and the center of the first tile along the line. Due to this, the machine rarely changes its mind halfway through, that it abruptly needs to be turned and crashes into the wall.
If the first 5 tiles of the path coincide with the path calculated in the previous time, then add 10 points. The idea is the same as in the previous paragraph: not dramatically change the path.
If the path is considered to be at the start, then paths that go straight to 3 or more cells receive a bonus of 10 points. If there were no other machines, then this point would not be needed, however on many maps you can arbitrarily strongly want to turn right / left right away, but insidious rivals interfere simply by the fact of their existence.

For the calculation of bonuses for turns / straight lines and penalties for sharp turns, the last 4 visited tiles are also taken into account.

I will reveal what is meant by "on the intended trajectory." For each tile of the path, in addition to that in which the machine is now located, an angle is calculated for which the rotation occurs (0, + -90, 180 degrees). If the tile moves directly, then bonuses located at a distance of 250 from the axis of movement are considered to be potentially selected. If a turn occurs on a tile, then bonuses located at a distance of 200 <r <500 from the center of rotation (tile angle) are considered to be potentially selected. To make it clearer, I bring a picture; areas of the chosen path are painted over with gray colors, in which the bonus is considered to be potentially selectable (the debugger allows drawing only circles, so ignore the extra parts of the red circles). As you can see, the plots do not even coincide on the borders of the tiles, but for an approximate assessment whether it is possible to pick up a bonus or not, this method is suitable.

After evaluating all possible paths, the path with the highest number of points is chosen, and the trajectory of movement is calculated along it.

In the final, the whole track is not known, and only tiles at a distance of 2 from each of the player’s machines, as well as all previously seen tiles, are visible. We practically didn’t have to modify the algorithm: a distance algorithm was added to the wave algorithm when a new tile was opened, and it was also believed that an unknown tile could be passed in any direction.

Trajectory calculation

Well, the path is obtained, how can we go through it now, how to build an optimal trajectory? Turn to Google with the query Racing line AI ... or some other similar ... and get not very much. There is one video on Youtube , there is a mention of an article in the Game AI Pro 2 book: Collected Wisdom of the Game AI Professionals, and the source code for the TORCS bot mentioning the K1999 algorithm. A little more effort and I managed to find the disk attached to the book I mentioned, on which the source codes for the articles lie, although I could not find the article itself. A cursory review of what is available made it possible to conclude that the process of constructing a trajectory in all three cases is iterative. We take as a basis anything, and then smooth and optimize, and so N times.

So, what to take as “at least something” for further optimization? After some deliberation, I decided to try to do the following: in the tile-turns we put the trajectory point closer to the center of rotation, in all other cases we believe that the trajectory lies through the center of the tile that we need to pass. After that, we split the trajectory into smaller parts by adding one point in the middle between every two already existing ones, and then again.

The resulting approximation must be somehow smoothed. It was decided to use the approach described in the video and source code for the article: consider the trajectory as a sequence of hinges connected by springs. A “spring” is created between each pair of successive points, with a stiffness k, and the length of which, in a quiescent state, is considered equal to the distance between the points. The speed of each point of the trajectory is initially zero.

Now, at each iteration, optimization is considered every three consecutive points of the trajectory A, B, C. If the angle ABC is not equal to 180 degrees, then the “hinge” at point B tends to straighten, and the corresponding forces are applied to the points A, B, C. Also, forces are applied to point B to bring the springs AB and BC to the initial length. The elastic forces are calculated according to Hooke’s law: F = kx, where x is the difference in length between the current and initial length. The picture shows the forces that are applied to the points; It is assumed that the spring AB is elongated, and the spring BC is compressed. Forces seeking to straighten the joint (Fab, Fba, Fbc, Fcb) are proportional to the angle ABC and the length of the corresponding segment. The elastic forces of the springs are applied only to point B, since the elastic forces for points A and C are taken into account when processing the previous and next three points of the trajectory, respectively.

After calculating the forces for each point, its speed is calculated using the trivial formula V = 0.9 * Vprev + F, where Vprev is the speed after the previous iteration, and F is the sum of all forces acting on the point. A factor of 0.9 is added to make the process converge faster. For each point, its new location is determined by the formula Pos = Pos + V. If the new location is outside the route, it is reset to the previous value, and the speed is halved.

After several such iterations, something similar to the optimal trajectory of movement is obtained. In the picture, the black line indicates the initial trajectory, and the red line - after 500 iterations of optimization.

However, the optimal trajectory is the optimal trajectory, but somehow bonuses must be selected. One option was tested, which remained in the end. After half of the optimization iterations, the following happens once:

For each bonus lying in the tiles of the path, the nearest point of the trajectory is searched.
If the distance from the point of the trajectory to the bonus is less than 250, then the point moves to the place of the bonus, and its mass increases several dozen times.
Optimization continues.

The idea is as follows: if the forces acting on a point with the coordinates of the bonus are very large, then this means that the trajectory through this point is inconvenient for movement, and even despite the new large mass, it will gradually attract to a more convenient place, if are not large, the cost to pick up this bonus is small enough, and take it profitable.

The described approach to trajectory construction has several problems:

When crossing the border of the tile, the trajectory can change dramatically, as the initial approximation changes, which is further optimized, and the current speed and direction of the machine’s movement are not taken into account, as can be seen on the attached gif:

Partially managed to solve this problem by adding another point of the optimized trajectory behind the machine, due to this, the current direction of movement is taken into account and smoothness is improved.
The process is computationally very expensive: judging by the profiler, a bot spends about 80% of the time on calculating the trajectory, since a large number of floating-point operations are used (especially on calculating the angle and taking the root).
The process is not always stable, and the trajectory calculated on the next tick may differ from the previous one. Usually this happens because of awkwardly located bonuses: on one tick it is convenient to take them, on the next there are no more, and the trajectory can change dramatically because of this, and the machine starts to slow down and rush left / right.
Unfortunately, due to laziness and lack of time, it was not possible to get rid of one bug that led to such a trajectory in some cases:

Fortunately, usually the next time you recalculate the trajectory, everything returns to its normal state, so I left everything as it is. Most likely, this is a lack of implementation, rather than a conceptual problem, unlike the previous three points.

Motion control

After calculating the trajectory, you need to somehow go through it. This is perhaps the most simply implemented part of my bot, and it is surprising that IT works at all.

The bot does not control the engine power and always presses the gas pedal to the floor, and the steering wheel points the calculated path to the first point.

To decide whether to slow down or not, school physics is used in a class of about 7. For the nearest 4-5 points (the amount depends on the current speed of the vehicle), the radius of curvature of the trajectory is considered. As an approximation of the radius of curvature, we take the radius of the circle described about three consecutive points of the trajectory. After that, the speed of the car is squared and divided by the minimum of the received 4-5 radii. As a result, centripetal acceleration is obtained, with which it is necessary to move in order to drive along this trajectory, for which, in turn, it is necessary that the friction force (traction) allows this. If we assume that the maximum possible friction force is constant (which, as far as I understand, is not quite true, but I have not studied in detail the physics of motion), then if the centripetal acceleration exceeds a certain value (strictly speaking, mu * g, where mu is the coefficient of friction, and g - gravitational acceleration), then driving along such a trajectory with the current speed does not work, and you need to press the brake, which the bot does if the calculated acceleration exceeds 0.37.

Shooting

Shooting is implemented very simply: the bot believes that it will be in the next n ticks in the event of a shot, assuming that all cars are traveling at a constant speed. If the distance from the projectile to the opponent turns out to be less than 50 in one of the ticks, the bot shoots. Shooting is done only if the calculated distance traveled by the projectile does not exceed 2000. Also, the first 300 ticks after the start of the bot shoots only if the opponent’s machine has less than 30% health, because an extra 100 points are given for the kill, and at the beginning of the race the situation often occurs when one machine takes the lead, rivals are discharged into it, and it has few health points.

Butter

The oil spilled on the road throws a car that has passed along it in a random direction, and some time after hitting it at the car, the coupling properties of the wheels deteriorate. When deciding whether to spill the oil, the bot wakes up every time delusions of grandeur: he believes that all other rivals go the shortest way to the next control point, and the bot's machine is on the perfect trajectory. And if it is a little more specific, then each time the bot presses the brake, an approximate trajectory is calculated for each competitor for the next 10 tiles (without busting, simply based on the usual wave algorithm), and if it passes through the point where the bot car is now, then he pours oil.

Nitro

For use of nitro, the trajectory curvature radii calculated for braking are used. If the minimum radius of curvature for the next 9 points of the trajectory is more than 4000 (i.e. the trajectory does not contain sharp turns), then the bot turns on nitro.

Conclusion

Writing a bot turned out to be an interesting exercise, although it greatly interfered with work and sleep, but the finalist's sweatshirt was clearly worth the pain. The main feature of the bot compared to many participants is the almost complete absence of physics miscalculation. There are some more undescribed moments, for example, how a bot is selected from situations when it is stuck, or how I tried to optimize the code to meet the time limits, but they are not significant. Bot lacks the ability to try to drive around puddles of oil, and especially not enough attempts to dodge other machines, but this did not have enough time and effort; perhaps it would have been possible to achieve this by adding repulsive forces at the stage of trajectory calculation.

I want to thank the organizers of the competition for sleepless nights and interesting time spent, colleagues at work and friends who motivated to participate and were not allowed to quit when they really wanted to, as well as to http://russianaicup.ru/profile/JustAMan for a user-friendly interface for drawing in local runner.

PS: We are waiting for a detailed story from the winner, he promised!

PPS In the comments convinced to put the code. https://github.com/mbakulin/russianaicup2015 . Two commits, one - what drove in the final, the other - what ended the sandbox.

Source: https://habr.com/ru/post/273579/

All Articles