📜 ⬆️ ⬇️

On artificial intelligence in poker



Poker has long attracted researchers of various kinds from amateurs to serious scientists. And, it is no secret that such close attention to poker correlates with the growing popularity of online gambling, which flourished in the 00s. To date, the man has already given way to the limit heads-up Texas Hold'em machine, while in the no-limit and multi-seat poker disciplines a person still takes over. The greatest contribution to the problem of building a strong computer poker intelligence, no doubt, made a research team from the University of Alberta , and the proposed family of algorithms for finding equilibrium strategies is by far the most fashionable and often used tool in the construction of poker agents. But first things first.

This publication focuses solely on the problems of artificial intelligence in poker, issues relating to obtaining information from customers of online casinos, bypass security and imitation of user actions will not be considered. It is assumed that the reader is already familiar with the rules of poker .

On the complexity of poker for machine learning


Poker is notable for being a game with incomplete information - your opponent does not know your cards. It also contains elements of randomness - the cards of the players and are randomly swept out of the deck on the board. To these two aspects of the game of poker is added a third one - the number of game states, which in some poker disciplines is only slightly inferior to chess. To imagine the scale of the problem facing the researcher, consider the limit heads-up variant of Texas Hold'em. The figure below shows schematically the structure of such a game with an indication of new information appearing at each round of trading.
')


In the variant of poker we are considering, it is of the order of 10 18 game situations, which makes this task, to put it mildly, not solvable in the forehead, since it requires completely inconceivable computing resources. However, it should be noted that some card combinations are equivalent to each other, for example, preflop, the starting combination of two aces A ♥ A ♦ is absolutely identical to any other pair of aces. Thus, considering the first betting round, in which players have only two cards in their hands, we can single out only 169 different nonequivalent two-card combinations. But even after the allocation of equivalent classes of card combinations, Texas Hold'em has about 10 14 different game states that require dozens of petabytes of memory to store, not to mention the time it takes to train all this to play poker. Should I then stutter about variations of poker for 3 or more players?

A common thing in cases where the original game is too large for direct study is the transition to an abstract game (abstraction), which unites the states of a full game in a strategic sense. In other words, when analyzing and solving a game, some states of the original game are treated as indistinguishable. As an example of building abstraction in poker, consider preflop, where, as you know, the best starting combinations are a pair of AA aces and a suited combination of an ace with a king of AKs. It is quite obvious that the strategies of drawing these cards in the first round of trading will slightly differ from each other, therefore it is possible without shame of conscience to combine these two pairs of cards into one abstract class of starting combinations. This trick can be repeated for other card combinations, having received, as you already guessed, “abstract” poker with less demands on computing resources.

There are other ways to build abstractions that apply not only to card combinations, but also to betting patterns in unlimited disciplines, which allows you to end up with an abstract game that can be studied and analyzed. The key point here is that during the transition to abstraction some of the information about the original game is lost. The more significant the lost information, the more vulnerable the poker agent will become.

We now consider the basic concepts of building strategic profiles for poker.

Expert Systems

The very first poker agents were expert systems or if-then rule sets for various game situations. Such rules were hammered into hands with the help of the players themselves and / or were obtained by analyzing a large number of real game distributions. Anyway, the disadvantage of this approach is obvious - an artificial opponent is easily exploited after identifying weak points. Of course, these weaknesses can always be corrected by adding new entries to the rule base, but this entails a number of troubles associated with the increasing complexity of support, debugging and the decreasing performance of the system as a whole.

Today, poker agents of this type in their pure form are practically not used due to their easy exploitation. The most famous representatives of this class of bots: the first versions of Loki and Poki, FellOmen, and Shanky Bot, widely known in narrow circles.

Exploiters

The basic concept of exploiting poker bots is based on building counter-agents to known opponents, using their weak points to get the maximum profit from the game. As a rule, either modeling of the opponent’s behavior in a particular game situation, or pre-calculated strategic profiles for a specific opponent, are used.

Decision making in systems with opponent modeling is reduced to multiple replay of “virtual” hands with different outcomes to determine the most profitable solution. The adequacy of such poker agents is due to the adequacy of the opponent’s model. But when it comes to modeling the decisions of a person, the fortune-teller Shannon is always involuntarily recalled, which guesses the player’s action in nine cases out of ten. Most often, neural networks are used to simulate live opponents, which, according to some sources, can give ~ 80-90% reliability of prediction of player actions. This circumstance makes this type of poker bots most attractive for playing in online casinos, since modern GPU technologies make it easy and quick to build an opponent model and calculate a profitable solution.

In academic circles, exploiters are known based on finding the best response strategies . The idea of ​​this approach lies in the fact that within the framework of a given abstraction, a counter-strategy is chosen against a known opponent. This allows you to exploit the weaknesses of a particular opponent as effectively as possible, but against an arbitrary opponent, the results can be disastrous. Moreover, the construction of such strategic profiles requires significant computational resources, although in recent times there have also been advances in this area.

For any bot exploiter there is one drawback. It is concluded in the need to implement the initial strategy, since it is obvious that it is impossible to model and exploit an opponent after a dozen hands. Moreover, we are talking about tens of thousands of hands with a specific enemy, before exploitation begins to bear fruit. Usually, while collecting statistics on your opponent, poker bots of another type are used. When playing with people at the initial stage, an approach is often used in which the exploiter learns in advance of the history of hands against a group of players united by common characteristics, such as the frequency of entering the game, the desire to go for a showdown, and so on. Then, as the game hands are played, the characteristics of the opponent's person become clearer more and more, which allows the operator to quickly and adequately select and train the appropriate counter-strategy.

The quality of the game exploiters far exceeds the expert systems, as they are able to identify the weak points of the latter. Their effectiveness against arbitrary opponents is largely determined by three factors - the initial strategy, the statistics collected on the opponent and the adequacy of the model used. The most famous bots of this class are BRPlayer and Vexbot, familiar to many from the Poker Academy.

Equilibrium strategies

Until relatively recently, pure game-theoretic methods for building poker bots belonged more to the field of academic interest, since their implementation was associated with computational difficulties. This is primarily about finding the Nash equilibrium .

Let us dwell on the concept of equilibrium strategies and consider the well-known game Stone-Scissors-Paper, played a considerable number of times. Let the first player follow a pure strategy all the time playing a stone. Then the second player, who saw through the game of the first, will play paper all the time to get the maximum profit. The first player to not lose, and even maximize their winnings, you need to switch to scissors. It is obvious that the second player will change his strategy to a stone and the whole cycle of changing strategies will be repeated again. It is easy to see that if the first player threw away a stone, scissors or paper equally likely , then the second player, no matter how hard he tried to change his strategy, would not be able to exploit the enemy.

The above example can be considered as a game of two exploiters, constantly pushing their mixed strategy under the opponent. Obviously, sooner or later, the strategies of both players will converge to an equally likely throwing out of a stone, scissors or paper. This particular situation is called the Nash equilibrium, because while in equilibrium, neither player can play better than the other by changing his strategy.

As mentioned above, there is nothing to think about finding equilibrium strategies for poker - it’s not long before humanity will have enough resources to solve games of this size. Therefore, researchers have focused on the equilibrium strategies of abstraction of poker, but then the question arises related to how adequately the balance of the abstract game corresponds to the balance of the full game. In this connection, the Nash equilibria of an abstract game are usually called approximate equilibria or ε-equilibria , where ε symbolizes the potential exploitation of the strategy.

In ε-equilibria, the main characteristic is their effectiveness when playing with a super-exploiter. Or, in other words, how far from the true equilibrium of the original game is the found ε-equilibrium. It is quite expected that the use of too rough or inadequate abstractions gives a rough and inadequate ε-equilibrium. It is also quite expected that the less information about the full game is lost during the transition to abstraction, the more abstraction itself and the higher the requirements for learning algorithms and computing resources. This in turn led to the search and development of new methods for finding equilibrium systems for such large games.

To date, there are several effective algorithms for finding equilibrium in large abstractions of poker. The most popular of them are algorithms based on minimizing regret . This family of algorithms is being actively developed by the University of Alberta research group and deserves separate consideration. Let me just note that using one of these algorithms, the poker bot Slumbot was trained, in which the first three rounds of trading are represented without any abstraction. This bot took first place last year at the Annual Computer Poker Competition.

The main advantage of equilibrium strategies is their non-exploitation. At the same time, these strategies are not particularly profitable even against very weak poker players. The most famous representatives of this class of poker agents using equilibrium strategy profiles are Hyperborean and Polaris, the first bot who confidently managed to beat the poker man.

Instead of conclusion

This mini-review does not claim to be a complete presentation of the problems of computer poker, but it can give an approximate idea of ​​the state of affairs in this field today. Perhaps, if things go well, I can continue to reveal this topic.


Source: https://habr.com/ru/post/173273/


All Articles