Gold Medal at the Russian AI Cup 2013 - how it was

Greetings, Habr!

I will write about how I happened to take part and win in the annual championship on programming of artificial intelligence Russian AI Cup 2013 (codetroopers). I acted there under the nickname slash and won first place both in the final and in the sandbox at the time of summing up in it.

About the championship

First, a few words about the championship itself for those who do not know what it is.
')
The championship itself is organized by Mail.Ru Group and is held online, that is, you can participate in it without getting down from your favorite computer chair or sofa. It is open, that is, anyone can register on the site and begin to fight for the top places. It is being held for the second time, the last tournament was held in 2012, and we hope that the guys from mail.ru will continue to continue in the same spirit. More information about the championship can be found on the website of the championship russianaicup.ru , as well as in the blog of Mail.ru Group.

Introduction

Now that will be in this story, and what will not. Probably, there will not be detailed instructions on how to reach the final and win it. Agree, it would not be so interesting if such an instruction was. There are no technical details on how to implement a particular function, what tools to use, how to write a program. It will be rather my personal impression of the championship, let's say, the sediment that remained after it. Plus some ideas, some of which seemed interesting to me.

Although I do not plan to describe technical details in my story, but I will be happy to answer such questions if they arise in anyone. You can also see the strategy code itself for more detailed analysis: drive.google.com/file/d/0B5pWCOQbKqEuekZNRWNkQUpNbnc/edit?usp=sharing . Of course, it was not written perfectly, the development format still imposed its limitations, and global variables are used, and all in one file, but I think that is quite readable.

I will say right away that I did not do anything supernatural in my strategy. Most of the ideas that I eventually realized came up in one way or another from other participants, many of them were already described on the championship forum. Perhaps I managed to combine the best of them so that they earned as efficiently as possible.

How it all began

I’ll start, perhaps, from last year, when I took part in such an event for the first time.

In fact, the desire to write a strategy that itself would play against a living person or against another strategy came to me when I was still a student and played different toys myself, but then I lacked programming experience and some other technical knowledge. and most importantly, of a specific goal that would force one to start doing it. And since I was not looking specifically for this kind of competition, this desire remained something of a dream until at our university forum (hello to the guys from forumlocal.ru) I came across a discussion of last year's championship.

I must say, the discussion there was quite active, and then many people got involved in this matter, but I was a little unlucky, since I noticed this topic and began to delve into it already before the very beginning of the first round, of course, while I understood localrunner ' om, while reading the documentation, while writing something, the first round has already passed and I did not get into it. Trying to get into the second round in the final, too, seemed like a difficult task, it was still difficult for me to assess the level of other strategies and the chances of being in 50K, so I stopped at the fact that I just finished my first idea - namely, looking at the tanks of others map of profitability of positions and then go to the maximum point along the gradient. Well, plus a couple more chips like churning ammunition that hit me, firing ahead of the curve and coordinating the turn of the gun and the body. Then I poured it all onto the server and waited where my strategy settles.

As a result, she finished the competition somewhere around the 2-3rd hundred. I was completely satisfied with this result, as I showed that it would have been possible to get through to the second round if I didn’t have such a big expense, if I started a little earlier, well, at least they will give me a T-shirt, and there will be an incentive to play further. Well, I thought, the main thing next year began not provaflit.

This year I remembered about the championship somewhere in October, I looked through the Internet a bit, I didn’t find anything right away and decided that I would probably find out about the beginning somehow, again, who can check the forum. As a result, an alert arrived in the mail on November 9th. The truth was generally busy at that moment, so at first I ignored him, I remembered only after 2 days, when the work was released a little. I looked at the site, yeah - everything has already begun, but I’m sort of done before the rounds. I downloaded the package, everything started, it took at least the time to set up - since last year nothing has changed, but I have not yet managed to forget everything.

Formulation of the problem

Briefly about the task that was proposed to be solved by the participants of the Russian AI Cup 2013 (copied from the championship website).

The player has at his disposal a squad of several (from 3 to 5, depending on the stage of the competition) fighters (units) with different characteristics and abilities. Unit sets are the same for each player. Units get the right to take turns. The course of the next unit begins only after the completion of the previous one. Each fighter's move consists of one or more actions that the fighter performs until he has enough action points or he forcibly ends the move.

The game field is a rectangle divided into square cells. The number of cells is 30 × 20. At the beginning of the battle, the players' units are arranged symmetrically every in their quarter cards. A unit occupies exactly one cell. Two units can not simultaneously be in the same cell.

A player receives points for damage inflicted on another player’s fighters (1 point per unit of damage), for destroying an enemy fighter (25 points), etc.
The number of points scored by players is a factor that determines the winner (or the place taken). The more points - the higher the place.

A more detailed description of the game world can be found on the championship website in the "Rules" section.

Basis of strategy

We will use the same idea as last year - to calculate the profitability of each position and go to the most profitable. She has established herself since last year, and it is unlikely there is something fundamentally different you can think of. Plus, there is still a completely discrete time and the map is small - it means complete bust will be enough.

In total, fundamentally different, or you can even say major, I had two versions. The first played in the first round, passed in the second, and already before the second, the second version was released.

I'll start with the first one. So to say, we write that the first comes to mind. The strategies here are actually 2, one local, which will determine the specific position where our soldiers will go, and the second is global, which determines the overall direction of movement of the whole team and its behavior.

The goal of the global strategy is to find the enemy, kill the enemy, and then regroup and heal, if necessary, to move on to the next enemy. Naturally, there are 3 modes of action in which the local strategy will act in different ways:
1. Intelligence - the mode in which the team is located until it sees the enemy.
2. Battle - as soon as the enemy is discovered, we begin to fight with him until we kill all the soldiers, or until he retreats, or until we retreat.
3. After the battle - the mode is activated after the battle mode, if there are no enemies in sight. After another 2-3 complete iterations, if the enemy is still not there, we believe that he either ended or ran away (or we ran away). So again, you need to search for the enemy, and we go into intelligence mode.
The concept of a global goal also arose - in my case it was just a cell on the map, where, according to my estimates, the enemy nearest to me is located. My team will always strive there. The exception was the initial goals on some maps, namely, those where the team is divided into two groups - there we had to manually set the goal at the initial moment, so that they would unite, and only then go to the enemy. Even after the first half of the final, I manually set the first goal on the fefer map, otherwise the team would climb into the tube of death and often be ambushed there.

Now local strategy.
So, the soldier looks at all positions (cell + rack), which he can reach in his full turn, for each he considers several parameters:
1. Security. The rude formula looks like this: the initial number of lives of our fighter minus the maximum damage that he can fill in this position.
2. Points that he can fill there, i.e. possible shots or throws of a grenade from this cage, plus, for the medic, the treatment is also considered points, since this is his main profession. If the enemy soldier dies as a result of our shot, in addition to points, we also increase the security position. If the bonus in this cell is either points too, if you are in battle mode, or safety, if you are in another mode.
3. Priority. If we shoot someone from this cage, then the priority tells how profitable it is to shoot at this fighter there. So, for example, the highest priority for a shot is a medic, commander, sniper on all maps except the labyrinth (in the labyrinth, on the contrary, it has the lowest priority for a shot). If we just go into this cell, then the priority is the proximity to the global goal. The closer - the higher the priority. Intimacy here is measured in steps.

At first, to compare two positions, I tried to simply compare the sums of these parameters with some coefficients, but I quickly realized that this was not enough, so there are some critical points for safety, for example, 0, when passing which position becomes significantly worse. Even if we fill there with a little more points, but security is just below zero, the soldier will die, and we will lose significantly more from this. If, on the contrary, we will beat out all of his life from someone else’s soldier, it’s better than if we beat a little more out of another soldier, but he remains alive.

As a result, a complex function appeared comparing the two positions in all the available parameters. In the first version there were about 5 cases, in the second - about 10. Here are some of them:
If the security position is more than 50, and the superiority in glasses is more than 50, then we choose this position without hesitation.
If the security of a position is more than 50, the loss in safety is no more than 30, and the superiority in points at least 20 - we also choose this position.
Plus, in some cases, we ignore some parameters, for example, if we compare the position from which we shoot, with the position to which we are simply going, then the priority is ignored. If the security position is close to 0, then ignore the priority of the global goal. Where there is, they can kill us, and here we are still thinking about something global.

I also wanted to regulate the general behavior of my fighters in terms of aggressiveness. To do this, I identified several behaviors that in fact simply influenced the balance between points, priority and safety when comparing positions by changing the coefficients.
1. Normal behavior - an attempt to maximize balance points and security, are in this behavior, if the number of enemy fighters is the same as ours.
2. Aggressive behavior - we place more emphasis on points and priority less on safety. Behavior is activated, if the enemy has fewer soldiers than me, or if I have not seen the enemy for a long time, you need to look for it more aggressively.
3. Rush - we finish off the enemy, if he has very few soldiers, or if we are in intelligence mode for a long time, and at the same time we are lagging behind on points. You need to quickly run to find someone.
4. Protective behavior - we try to act carefully, focus on security. It turns on when we do not know the order of the move with the enemy, or when he has a slight advantage in numbers.
5. Flight - stupidly dumped from the battlefield, if the enemy has a strong numerical superiority.

After this method of ranking and a small shamanism with coefficients, the strategy began to pretty well move toward a global goal, and then shoot at enemies.

Problems encountered

However, there were some problems.
The first - the soldiers could go around some obstacles from different sides. To avoid this situation, I added penalties to the security of each cell. Penalties were of 4 types: a fine for the minimum distance in Manhattan to an ally, a fine for the maximum distance in Manhattan to an ally, a fine for the minimum distance in steps to an ally and a fine for the maximum distance in steps to an ally. It is clear that if the first distance is large, it means that the soldier is completely separated from the rest, for which he was fined. If the first is small and the second is large, then it is separated, but not alone, but with someone else. This, of course, is not so bad, but all the same, we impose a fine on him so that there is an incentive to connect with the others. The distance in the steps must also be taken into account, as the soldier may be close, but behind the wall. On the other hand, close behind the wall, and far away in steps - this is better than a little closer in steps, but further along manhattan, therefore you should not throw out penalties for the distance over manhattan. Favors are given only to the leader, who must run forward, thereby increasing the distance to the rest of the group, the rest should reduce it. I choose a scout as a leader, but if the scout is completely in a foul situation or has died - the one who is closer to the goal becomes the leader.

The second problem - being in the intelligence mode, the soldiers did not see the enemies, so security was considered in some heuristic way, often overestimated, which led to the fact that in pursuit of high priority, the cells ran out to the enemy, and seeing him did not have time to hide back. I had to artificially underestimate the priority ratio, if only 1-2 steps were left until the end of the warrior’s turn.

The third problem - sometimes there was a situation when, at the beginning of the turn, there was a bonus next to the soldier, but he did not take it, as he evaluated the position as a whole and believed that he would run better closer to the global goal and get more points for it for priority, What will take the bonus next, because what will happen after he takes the bonus, he no longer counted. It's the same with shooting - you can bum once, and then hide, and the soldier instead immediately runs behind the wall, because he thought it was dangerous to shoot from that cage.
A rather simple natural solution appeared - to normalize points by the number of actions spent, then he will easily see that you can take a bonus for only 2 action points, and you have to go there already 10 or 12 points, or shoot for 4 action points instead of fleeing for 8 action points.

However, with this rationing, I suffered a lot. For example, it turned out to be most advantageous to stand still, since action points are not spent at all, or there were wrong decisions about whether it was better to shoot, or throw a grenade, etc. In this normalization, a bunch of crutches and already became confused with them, in the end, with difficulty, made everything somehow work tolerably for the first round with the thought that the approach was wrong, and in the next version you need to make a calculation of the soldier’s full move.

The final touch is to learn to determine who walks before, the enemy or me. Already in the first version, I realized that this information is very important, for example, when choosing a target for a shot, I had higher priority for enemies who go earlier, but it was necessary to determine whether this soldier had already left or not, their medic goes before me , or vice versa.

To begin with, I implemented the easiest ways to determine the order of moves:
1. If the enemy has just been in this position and now we have seen him - it means he walks earlier / later, depending on the type of fighter. Or the soldier remained in the same position, but his action points changed, a grenade or something like that disappeared.
2. If we saw in the last move, but then he disappeared, then with a high probability he left somewhere. Here the second option is possible - he was killed, so the second method is less reliable, but you can also use it.
In the first version I implemented only these two options, then another appeared.

So, the first release version is ready, confidently wets smarts, uploads to the server, I see that after all the described fixes she keeps well (by that moment she has already crawled to 30ki), she will pass to the second round.

Second version

It seems that there was no sense to cut the first version, because I knew that I would redo the main logic, so I decided to take a short break and save up ideas.

In general, it seemed to me, I was much better at inventing new ideas, if I didn’t write anything at this time, my thoughts didn’t waste my efforts to decide exactly how to move the fighter right there, how to choose the right rack, etc. You could calmly talk globally and evaluate which approach is worth implementing and which is not. For these couple of days before the first round, I finally realized that I would be broke, and I would need to do something versatile for me to sew the strategy under some specific patterns and places on the maps. In general, when developing, I get much more pleasure if my program or system does everything itself in an automated mode than if it is necessary for it to set some input parameters, specify something, manually set some hints, etc. Here, natural laziness in its good manifestation probably plays a role - like, I'd rather think a little more now, but then I don’t have to do anything, they will do everything for me.

In general, during this month I have developed a similar way of working out - in the morning at breakfast I admire the battles of my soldiers, I am glad for their achievements, but I also note for myself some moments where I could have acted differently. Then, while going to work or from work or somewhere else - there is time to think about all these moments and realize why the fight went wrong. It is important to first try to find exactly the global cause, and not to write off all at once with an incorrect factor, by correcting which you can spoil the remaining 10 fights. Realizing that there are flaws at the global level, as a rule, I came up with right away and some solutions that were already implemented in the evening. It happened, of course, that it was not possible to endure it until the evening, a little free moment at work was given out - code it faster, faster, while it was lukewarm.

So, the main changes that appeared in the 2nd version.

First, the search now takes place not in all the cells a soldier can reach, but in pairs of cells, or rather, in pairs of positions (cell + rack), one of the positions is the action position, i.e. in which the soldier He does, and the second is the final position, where he finishes his turn. In a position of action, it’s not at all necessary that he should shoot at someone or heal someone, he can just walk there and get points for it. For example, if there is a bonus.

Since the search is expanding significantly, my strategy begins to climb out of the allotted time frame - at the first stage, it is just the caching of the calculated position parameters that saves, although you will have to return to this problem more than once.

Now you need to compare pairs of positions, I called them goals. Each goal remains the same parameters, namely
1. Security final position
2. Points in the action position
3. Priority of the final position in the sense of a global goal + priority of the action in the position of the action, if the action takes place.
Plus one more parameter is added:
4. Points that I plan to get in the next moves. Here we add, for example, glasses for the fact that a soldier has discovered a cage that used to be in a fog. Points that an ally fighter will receive if I highlight the prospective enemy. Points that I can get on the next move from this final position.

With the help of the same “points on the next moves,” it was convenient to evaluate the shooting at enemies that the allies can finish off. So, for example, if I thought that I could shoot a soldier at 20 points, and then my sniper walks, and the soldier will not go away during this time, and no one will cure him, then I write myself 20 points, as well as 80 + 25 points received on the following turns. And it turns out more profitable than to shoot at another goal, albeit at 40 points, but which will have time to leave before the sniper moves.

And at first glance, a complex maneuver of the type to highlight an enemy soldier with one soldier, injure him, and then finish off with a sniper while he has not gone anywhere, it turns out automatically, and you don't even need to tweak anything. At the same time, the enemy may not even notice us.

Then I managed to knock out an enemy sniper, but only because my move was before his turn, otherwise he would have shot my sniper in the same way. It seems that the first to walk is a bit more profitable, but on the other hand, the one who walks the last one has more chances to fill up the scout with the same whale, as he sees him after he has already gone.

Further, calculating the possible damage to his or someone else's fighter, which he can inflict on a given position, I begin to take into account the possible movement of the fighter before the actual action. Previously considered only damage inflicted from the scene.

I add another option to recognize the order of the players:
3. I analyze the points received by the players and check the types of fighters who could inflict them. Acts almost smoothly, plus there is an opportunity to determine the order of the player, never having met him - a very big plus. It is strange that did not add it immediately.
Nevertheless, only the 1st is an absolutely reliable method, so until it works, I think that the order is almost certain, but still there is a very small probability that it is not defined correctly, so you should not make any critical decisions based on it.

The strategy starts to fight beautifully, smoothly choosing the most unprotected target and knocking out enemy soldiers with two-way schemes, but everything breaks down when most of the enemy soldiers find themselves in the shadows, and as they approach the 2nd round, almost all the tops are played, and they often manage to spy only one or two soldiers.

Of course, for this case, I have already accumulated an entire carriage of heuristics of the type. If there is a free cell next to the soldier, which I do not see, perhaps his ally is standing there, and in this case I, for example, multiply his damage by some multiplying factor, to take into account also damage from his ally. Or vice versa, when calculating the profitability of a grenade throw, I artificially raise points for a throw if there is a free shadow cell next to the soldier in the hope that another soldier is standing there. But the further, the more difficult it becomes to synchronize all these heuristics, so that they do not contradict each other, if there are 5 fighters, but only one can be seen, then sometimes it turns out that heuristics overestimate too much, or on the contrary underestimates the situation, in general, this is all for me starts to like less and less.

Transition from heuristics to exact model

In the end, the decision comes to abandon all heuristics and artificial spinning of points in favor of a clear mathematical model for which everything can be unambiguously calculated, and the rate is thus made on the accuracy of this model.

Now, each enemy fighter becomes blurred across the entire playing field in accordance with a certain probability distribution. , , , , , 1. , , , , , , . , . , , , , . .

. , , . .

. 3 , 4 , 12 , . , 12 , , 36, , — 2 . - . — . , 0.001, , 0.01, .

, . 2 .

— , , , , , , , , , .

— , . ( , - ), - . , , , . , , 4-5 . , , . , , - , .. , 1.5 .

, - , , - - , , . , 1 — , . , , , . , , , .

Probably somewhere at this moment there was the 2nd round, after which I crawled to the first line of the sandbox for the first time and even lasted there for some time.

Some improvements

, , , , , « , ». , .

, , , . .

3 . , .

— , , .

Began to process shots into the dark, i. to analyze the points, hit / not hit, even tried to predict that the soldier I shot was killed as a result of my shot. As a result, a merry bug ogreb that manifested itself in this battle: russianaicup.ru/game/view/519509 . , ( 10 ). , ( ), , , , , — , . , , . -, , , , .

, , , , , - .

, . , , - , . , , , . , . , , .

, - ( ), . 4 , . , , , . , , , , .

, , .

, .

- , , , - . . , , . , , .

, , , , .

. « ». , — , . , . .. ( ) , , , .

( , , , ). , , — - . , , . , ( , :)), , ( ), , - , , . , , - , , .

- . , , . , , — . -.

, . , , , , : russianaicup.ru/game/view/514859. It is clear that a survivor physician should easily overwhelm a sniper, but for some reason, he nods into a corner instead. It turned out that whenever I have one fighter, the minimum distance to the other fighters is 100,500 (some initial value, which must be overwritten by the first ally). As a result, the fighter gets a huge penalty for the minimum distance and in convulsions tries to hide from the cell with security -100,500 into the cell with security -100499 and no glasses are interested in him at this moment.

I'm not talking about pluses instead of minuses, when instead of a fine for a long distance to the allies, the soldiers received a bonus for this and happily ran in different directions. Such lapses were found even in the local runner and did not reach the server.

About time, otzhiorye my strategy.

At the forum, somewhere they noticed that it turned out to be one of the most voracious, on the verge of a foul, so to speak. Of course, first of all this is due to a sufficiently deep brute force, but I did not have the goal of squeezing the maximum of the allotted time. Understandably, there were a few moments when she began to exceed the limit, after which I ran the profiler and inserted some cache or reduced an unnecessarily wide brute force. After that, he poured onto the server, conducted several fights and making sure that everything proleased in time, he forgot until the next stop. The fact that my strategy still sometimes falls in time I found out by the second half of the final when I found the link “Fights with a fallen strategy” on the site. Now the profiler looked, there really were a couple of absolutely awful places in terms of speed.So on the move it was possible to increase the speed in 2 times on one of the fights, which fell in time without any changes in logic. In addition, there is always the potential for acceleration by raising the threshold for accurate calculation of soldiers, i.e., if you raise the rough estimate threshold with a probability of 0.01 to 0.02, you can still accelerate.

The time and resources I spent on strategy

In general, the time spent quite a lot. I had to write mostly in the evenings, but sometimes at work I could grab a lot of time. On the last Friday before the final, I devoted almost the whole day to the strategy; Here I also played the fact that I understood that I would have no time for her on Saturday, and if there was, I had very little, so I tried to lick everything to the maximum on Friday. So in general, it happened. On Friday evening, I watched the first wave of the final, slightly decided on my strategy, and went to bed with the thought “damn it, but a good start, okay, if it lasts until three in the morning, then maybe not 6k until the end of Sunday,” at 7:45 and a hard long day.

, , , , , , , .

visual studio, Localrunner , - . , , .

Future plans

- . , , , - .

, — .

!

Source: https://habr.com/ru/post/206680/

All Articles