As we wrote AI for the Jackal, and why he has schizophrenia

^{In the situation of a complete and hopeless ass, the power in AI was immediately taken away by the military, and this sometimes saved the party.}

For our AI, Jackal had to write as many as 4 different AI, each of which was responsible for their task. And then they got together and voted for what the pirates would do on the field this turn.

This entire complex system was needed because we did not have a single evaluation function. Roughly speaking, AI are of two types: when it is clear how to numerically evaluate your position, and when it is completely incomprehensible. The same game in Go at some point turns into a duel of intuitive conjectures, that is, it falls into a non-algorithmic task of complexity and uncertainty.
')
Long ago, my teacher, Professor Serbin, was telling a bike with a cheese-maker from Europe. Automators came to this charming fat man and asked how he makes such delicious cheese. They needed heuristics for temperature, density, and so on. The cheese-maker put his finger in the soft cheese, slowly and happily drew an arc there, smiled into all the sly face and said: “Well, how do you not feel!” .

Algorithmize the process of guests and could not. So, the development went like this: we drew arcs and smiled slyly, and the guys lost to us in the table Jackal. And they dreamed that the AI would take revenge for them.

What you need to know about the game, to understand what the ambush with AI

The core of AI was written by the developer idyury , who already had experience creating machine intelligence, but not for such strategies. The situation was somewhat complicated by the fact that he initially did not know how to play "Jackal", so the tests for the first six months were simply magical.

The principle of "Jackal" is quite simple - you play as three pirates, scouting the island and collecting treasures there. This is how the field looks like during the game:

Each cell is an adventure, such as a chest of gold, a crocodile, a cannon, a herd of wild horses, or an arrow that the pirate immediately follows. The combination of arrows and other elements creates a complex navigation graph, which in each new party determines the relief of the island and the "bottlenecks" for which the war is going.

^{Table cells}

^{For the screen had to change the "interface" of cells}

Actually, “Jackal” was originally created in a fairly simple form by teachers and students of Moscow State University back in the 70s. Until 2008, it was successfully played with the help of two sheets of paper, a pencil and knowledge of graph theory, and then we substantially reworked it and published it in the form of a board game. Devil's simplicity of development gave rise to incidents like a 12-year-old girl who took the second place in the tournament and parents writing to us how their four-year-old and five-year-old children are cut into the "island".

When the number of boxes sold in Russia exceeded 100,000 pieces, we decided to make an application. And here I needed AI.

The problem was that despite the existence of some general rules of the game leading to victory, there are a lot of options for achieving superiority over opponents. And these options for achieving excellence in different igromechanical planes lie: for example, either you have taken an advantageous position on the island, or have found a convenient treasure for carrying, or you have a balloon nearby, and so on.

One of the key difficulties was a fairly large set of possible cells on the playing field with different actions. The action of some cells was generally unique (for example, “airplane” or “aboriginal”). However, initially the task was to build a system that would structure all game information and unify all subsystems. I didn’t want to engage in “data fitting” and microtuning based on strategies - I needed a scheme that allows the game to be expanded with any type of cells.

Therefore, the developer sat down to saw some common data model for the game.

Data model

The developers have played with us and learned some common patterns of action within the game. The work began with the creation of several prototypes at once — systems for navigating the playing field, systems for assessing the current state and decision-making, systems for finding optimal paths, and various evaluation functions. Just at this moment we created an unpleasant legacy code, which didn’t manifest itself in half a year, and then suddenly began to interfere.

^{Side view of the playing field of a real game (this is the old version, where the coins were without relief).} ^{Here the black pirates were well established, and white could not take the gold without risking being beaten.} ^{But black also cannot take a coin.}

The fact is that in the desktop version the playing field consists of cells that must be turned over as the island is explored. And the very first model of the island assumed the graph of just such a shape and configuration, where each cell was a node.

^{Here is a linked graph in the end, where each cell of the navigation field is divided into slots, which are the nodes of the graph.}

The problem is that some desert-type cells are traversed in, for example, 5 moves, and in one direction. That is, in the place of one vertex there must be 5 with its own connecting vectors. At first, to facilitate testing in this place, a crutch was hammered, which gradually became enclosed with a code and became a supporting structure. What eventually gave rise to a number of crutches in the functions of finding the path and assessment later.

For each node of the graph, a list of checkboxes describing its current state from the point of view of the active player is set (for example, there is a flag indicating that the current node is under attack by the enemy pirate):

enum eLocks { eLock_Prohibited = 1 << 0, eLock_LandAttack = 1 << 1, eLock_SeaAttack = 1 << 2, eLock_TrapLock = 1 << 3, eLock_Unexplored = 1 << 4, eLock_Fortress = 1 << 5, eLock_ExcludeTrap = eLock_Prohibited | eLock_LandAttack, eLock_AllDanger = eLock_Prohibited | eLock_LandAttack | eLock_TrapLock, };

Flags are widely used for various purposes, but their main purpose is to facilitate search on the navigation network. The navigation graph is the same for all bots, but its context is constantly recalculated for each player individually, for example, the set of checkboxes for each node of the graph is unique for each player. If the cell on the playing field is still not explored, then in the graph it is presented as an empty cell:

^{Example graph for undiscovered cells}

Evaluation function

When the navigation model was ready, it was necessary to obtain an evaluation function, which allows to distinguish good moves from bad ones, and ingenious ones from good ones. Naturally, for a given variety, it was very difficult. In the end, after much deliberation, idyury identified 4 subsystems that determined your success in the game:

Exploration, that is, the assessment of the need to open new cells.
The “greedy” system, that is, the direct taking of coins from explored cells and their delivery to the ship.
The system of restoring pirates - when one of our pirates gets into trouble, it is often more rational to save him than to grab at the piastres or beat the enemy.
And the enemy pirates' attack system. This is the only aggressive subsystem whose task is to reduce the risks for the other three systems. Looking ahead, I will say that it turns on as an emergency if the risks for optimal action in the first three subsystems are too great.

I must say that the guys have already done chess and AAA-shooters with complex enemies, but they always had a good evaluation function at hand. It also implemented a simulation of alien moves based on its own logic of optimal actions, and then an estimate for each of the 4 subsystems. We came to a set of heuristics for each option. Then 4 subsystems “vote” for their moves (the weight of the voice changes depending on the situation on the board: for example, with one capable pirate of three, we will most likely drop everything and go to save friends). As a result of this rationing, one estimated function was obtained, which was then balanced for a long time. By the way, this is how different "characters" of bots are made - for the greedy one, we can simply increase the weight of the voice of the coin-taking system, and for the cautious player - the recovery system. In the current version, the balance is matched by eye after about two hundred test batches with experienced players. Later we will use the results of real games for learning AI.

And yes, AI itself should be simple at first, and gradually develop, corresponding to the level of the player’s training. But let's move on to the meat about decision-making subsystems.

AI intelligence

The logic of the cell opening system is quite simple - we choose a move (the move may not be a single move, but consist of many moves), which will increase the evaluation function. At the same time, each cell is assigned a certain weight, which is taken into account in the evaluation function.

The closer a cell is to the player’s base cell, the higher its weight. As the base cell of the player, a cell is selected in the middle of its coast (landing cell), but here you can experiment.

^{Cell weights when opening from the point of view of a white player on the first move.}

Naturally, if between us and the chest for 5 gold coins there is one undiscovered cell, the value of its discovery increases dramatically. From the point of view of other subsystems, I recall, it is considered as empty before the opening. The cage with the enemy and the cells under its attack lose weight dramatically.

With an equal (or comparable to about 5%) weight of two cells side by side (for example, at the beginning), we make a random move. By the way, the principle of randomness with the similarity of weight coefficients introduces a completely wild drive and unpredictability of AI into the game - this is very cool for players, but completely wild for debugging - we could not always correctly repeat the game to see where AI got confused - it was, roughly speaking, write your own random number generator, and not get acquainted with the peculiarities of the work of each on separate devices.

Next, the task is to open cells with a greater weight in a smaller number of moves. The more gold still remains on the map, the more votes the intelligence gets when normalizing the overall evaluation function. That is, the endgame intelligence is used by experienced pirates less and less, to the forefront of the problem of the delivery of coins and their push-ups from the enemy.

Coin collecting system

Comrade Sergeant Major Ensign Khvavaykin

The system of collecting coins is the most difficult in terms of processor load. Full brute force on client devices is impossible, therefore, simplifying, the branch and bound method is used, which is based on selective depth miscalculations.

In this case, the calculations are not based on a separate move, but on an action, which may include sequences of moves of various lengths. For example, when you need to sail a ship to any possible point, or go pirate on a cage with a coin. Actions can be combined into more complex actions, for example, you can take a ship to a coin directly by the shortest route (not necessarily on foot, a cannon, a balloon or a horse are also counted), but you can take it to the coast, where a ship can be fitted by another pirate. Search in depth searches available moves a few steps forward and selects the best. The current version for iPad 2 and iPhone 4 uses a depth of 2-3 moves, depending on the situation, on top devices it can be considered to be 4 turns with a wait of less than a second. There is much to optimize according to the logic of the algorithm, but so far there is enough depth for the bots to play well at the tactical level.

At the same time, the pirate still avoids undesirable encounters with enemies. And considers where they can go at a given depth of the forecast.

Undiscovered cells are not counted in the calculation of the path when a player carries a coin to the ship in order to reduce the risks of its “loss”. When calculating the path in other cases, it can be laid in closed cells with the assumption that they will be empty. After the “intelligence” of the cell, the navigation graph is completed.

^Piastrrry!

Rescue subsystem

A pirate can fall into the water (then you need to fit the ship with another pirate to save him), can fall into a trap (need a second friend to pull him out), can be eaten by a cannibal (have to resurrect him in an aboriginal fortress) or get into an endless cycle (for example, the arrow on the crocodile), which will lead to the fact that he will go crazy and die. For the resurrection, too, need a fortress and an aboriginal.

So, the logic of the pirates recovery system is also not as simple as it may seem. The system takes into account not only completely dead pirates, but operates with the concept of "inactive pirate." At the same time, the pirate on the Roma is not considered inactive, since he is able to heal himself by the next day’s lunch.

Assessing the need for recovery, the system takes into account not only the number of inactive pirates of the player, but also the number of inactive pirates of opponents and an ally. In the case of deciding whether to restore, the system chooses from the possible recovery options by simply comparing the number of primitive moves, taking into account the position of the enemy.

Simplifying, if our friend fell into a trap in the next cell, the evaluation function assigns 10 difficulty points to recovery. And if in two cells - the complexity grows nonlinearly, for example, it already becomes 30 points. Further, based on the primary possibility of recovery, hypotheses are discarded that are not needed in the current game situation (so as not to count the whole complex every turn when you need to do other things). The importance of recovery grows depending on the game situation, the number of our active pirates and the ease of recovery.

If the recovery is still with high probability, we consider the possibilities - for example, to swim up the player to the ship or be killed about someone else’s ship and occupy the fortress in N moves.

Attack subsystem

The idea of an enemy pirates' attack system is similar to the classic task of calculating moves, for example, in chess. We have a certain evaluation function that takes into account how many potential pirates can attack enemy pirates (yes, the fortress and the aboriginal here, of course, greatly change the situation). To make a decision about the attack, the system also calculates the depth, but it does not operate with actions, but with primitive steps. Moreover, at each level, several positions are selected with the best ratings. The complete miscalculation includes the miscalculation of all the pirates of all players with control over the coins (how many coins are currently “protected” by the pirate, as well as their proximity to the ship). As a result, we get a decision tree, on the basis of which the final decision on the attack is already made.

Despite the miscalculation in depth and the presence of actions performed, each move is a new assessment. As a result, the pirate can quickly change his action depending on the changing situation on the playing field. For the miscalculation, pairs of pirates are selected according to the “I am alone and opponents” scheme for 2 turns deep into the minimum.

The tactical block takes into account the players' influence on the cells of the field and their influence on the gold on the field - this is a question of the safety of our operations and the potential benefits from the fact that we rush into other people. The main question is how much the pirate controls gold in the perspective of several moves. Again, if our pirates directly control the gold or virtually control the gold after 2 turns, then the priority in the voice will be given to the coin-pulling system, most likely. Because AI knows what's better tit in the hands. Also, the expediency of including this system directly depends on the need to clear the space for security - if the enemies are not close, it does not even turn on after the fact.

Total

The game has a cooperative mode. At the same time, it turned out to be very convenient with him - he does not break the logic of the work of any system, being, in its essence, just an extension and a superstructure over the main logic. Allied pirates and ship, with a few exceptions, are considered as their own. The only important point - the balloon sends the pirate strictly to your ship, but it was decided to evaluate it very simply.

All systems operate practically independently of each other. Each of them, based on the navigation system, determines the available solutions and makes an assessment. The system of decision-making on the basis of the obtained estimates and taking into account the context chooses the best option. It is important that the proposed solutions did not worsen the position of the player on the field. If there are no such solutions at the moment (by analogy with hopelessness), the system will choose any option to attack the enemy, even if it worsens the position. Yes, yes, here we recalled that AI contest about silent animals, where Russian cows ate alien grass.

^{Screenshot debug mode.} ^{Blue is the current position, yellow is the selected best move.} ^{Output format: number of pirates (minus means opponents) |} ^{difference attacking a cage of their own and others protecting it |} ^{the prospect of cells for collecting gold |} ^{total cell count.} ^{On the screen you can see that the yellow one decided to go diagonally and occupy a cage with which you can attack another cage with enemy pirates and with a reserve of gold.} ^{The position has improved from -20.61 to -12.37.}

Implementation and tests

Here I will quote:

The first was implemented navigation system. On many projects, this task is solved in approximately the same way; it was only necessary to take into account the specifics of the Jackal (it was necessary, for example, to refine the algorithms after it turned out that the fallen coins lay in the subslots of the “long” cells, and not on the cell as a whole).

Having the navigation, an attempt was made to follow the path of least resistance and implement the usual algorithm for calculating the depth. It quickly became clear that nothing good would come of it. Too many possible moves for each pirate. Plus there were difficulties with the formalization of the behavior of cells of various types. Therefore, further development was in steps. The system of opening cells and collecting coins was made almost in parallel. The first, of course, was completed much earlier because of its relative simplicity. The second developed and improved until the end of development. After them the pirates recovery system was implemented. The attack system of the enemy pirates was done last. In principle, it can be said that there were no particular difficulties in the process of work, after all, the experience gained earlier had an effect. Major alterations were associated with not quite complete or misunderstanding of the rules of the game, for example, should a pirate who came to the crocodile through ice die or go back to where he came from.

The initial testing was carried out automatically at the design stage. The game control system itself is built in such a way that it works universally. It was originally laid for multiplayer game features. Any player can be controlled by one of three types of controllers: human, computer, or remote player (remote player can also be human or computer). At any time in a game, a controller of one type can be replaced with a controller of another type (for example, a computer can take the place of a player who has fallen off the network, but subsequently a player can reconnect and return to the game). For testing the Jackal in general and the AI, in particular, it played a very good service. The game started with computer players and they played with themselves. It remained only to monitor their behavior and collect statistics. The game now uses an SFMT random number generator, which allowed it to be replayed along the moves, if at some stage a moment arose in it that required debugging or modification. This technique is very convenient to identify and localize many problems, for example, the eternal cycles in the navigation of pirates, illogical or incorrect actions and so on. The second stage of testing was testing in a game with real people, which was built on similar principles, however, a specially developed system of logging game events was used to play the game session, recording and saving all the games played on the disc. Subsequently, the game session was simply reproduced in turns.

There are ideas and plans to create, for example, different behavioral types of pirates - cowardly, aggressive, researcher and others. I would like to implement different levels of difficulty for players with different levels of training.

Actually, in the comments you can ask questions idyury on the code and mathematics. Realization of all this here in Appstore , only for iPad for now.

And yes, AI is very lively lacking tactics experienced players like situations where a chest with 5 coins in two cells from the ship. An experienced player passes by and leaves pulling at the end of the game, but in our AI the “greedy” gets more votes and starts pulling, sticking for at least 20 moves, which could give a huge strategic advantage. But, of course, if we increase the depth of the situation on the field, the game will go slower, but more fun in terms of complexity.

In general, I just wanted to say that schizophrenia is cool. Sometimes.

Source: https://habr.com/ru/post/270529/

All Articles