📜 ⬆️ ⬇️

An eye for an eye

In the well-known problem of the theory of the game “The Prisoner’s Problem”, treachery is the only correct solution. However, if you also do not give the two sides to negotiate with each other, but repeat the situation many times in a row, then this strategy of behavior will not be the most beneficial. Choosing the right strategy can help answer questions about the evolution of human society, the emergence of facts of cooperation in personal and business relationships, the relationship of moral norms and self-interest.



In the late 1970s, Robert Axelrod (a mathematician, political scientist, now a professor at the University of Michigan) came up with an experiment to model the behavior of subjects set before a prisoner's repeated dilemma (IPD or iterated prisoner's dilemma).

The rules of the game were as follows:

The simplest algorithms have always cooperated (dumb) or have always deceived. Most obeyed complex patterns of behavior. But the winner was the program Tit-For-Tat (TFT, eye for an eye), psychologist Anatoly Rapoport from the University of Toronto. The logic of the actions of the program was very simple - the first round of unconditional consent to cooperation, and in subsequent rounds of a repetition of what the same opponent did in the previous one. The maximum possible result of 200 rounds is 1000 points. The winner received 504. Eight first places were taken by the programs, which at the first contact went for cooperation, they became known as “pleasant”. They received from 472 to 504 points, while the most successful "unpleasant" program 401.
')
At the second tournament, 62 programs were already exposed, many of the algorithms were improved, including the confrontation with the TFT. It is interesting to note that in no case TFT can earn more than its partner, but TFT was again the winner in terms of points. The program actively collaborated with other co-operators, but immediately responded by deception to deception.

However, in the real world, subjects act on the determinist as programs, so the following experiments included the possibility of error when the subject chooses an action at random. At the same time, the TFT program, when meeting with its twin, began to fall into an endless cycle of mutual revenge, when one erroneous action triggered switching triggers. With a noise level of 10%, she no longer became the winner. The following modifications of the CTFT and GTFT programs, including the ability to forgive cheats, significantly improved the results with a high level of noise.

The experiment was later expanded to include elements of Darwinian evolution. After each round, the subjects were able to choose their new strategy, where the probability of choosing each strategy was proportional to the number of points it gained. At the same time, at the beginning of the game, TFT and other cooperative strategies practically disappeared from the population and the fraudsters ruled the show! The average gain fell almost to unity, but after a while the remnants of the TFT strategies suddenly took over, and later gave way to those strategies that were more prone to forgiveness. That's when harmony and cooperation reigned in the world. However, this result is not intended. When repeating or expanding the experiment for hundreds of thousands of generations in certain epochs, one or another strategy prevailed, troubled times recurred many times when deceivers reigned and the average gain was about one.



Even in prosperous epochs in a welfare society, your friend or counterparty may be Pavlov. In the early experiments, the WSLS program with simple logic based on repeating the strategy in case of success and changing it in case of failure (win-stay, lose-shift) turned out to be a real werewolf. As soon as an honest business does not go, and once successfully deceiving a simpleton, she deceives him again and again while it brings profit to her.

The NetLogo multi-agent systems modeling environment contains one of the simple implementations of the prisoner's recurring dilemma (PD model N-Person Iterated). In the model, an unlimited number of subjects (turtles in NetLogo terminology) move across the field, and, colliding with each other, they make a choice based on the history of their relationships. In fact, these turtles remember only a single case of contact with each opponent, and do not have access to the full story. Initially, 10 turtles are set using 6 different strategies: random selection strategy, cooperative, cheating (defect), TFT, unforgiving (unforgiving, refuse any cooperation after a single deception), and unknown, which is programmed in the same way as TFT. And similarly to the evolutionary model, the first hundreds of iterations are the most successful in terms of the amount of winnings turned out to be cheaters (their average gain is close to 5 points). But gradually their result deteriorates and TFT takes the lead with an average score of 2.7 points, against 2 for cheaters. With the exclusion of co-operators, cheaters get an average of only 1.4, and with an exclusion of a random strategy, cheaters can only fool each other - they get an average of 1 point.

Based on the experiments that have been carried out, 4 commandments of the principle for success in this game can be formulated, which can be transferred to real life:

As was shown above, they do not guarantee success in a short period, the result is highly dependent on other people's strategies, on the level of random factors, but this is a very simple and at the same time strong and universal algorithm of behavior in such a game.

Literature:
Philip Ball Critical mass
Robert Axelrod. The evolution of cooperation (eng.)

References:
www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf (eng.)
www.ncbi.nlm.nih.gov/pmc/articles/PMC2460568 (eng.)
www.prisoners-dilemma.com (English)

Source: https://habr.com/ru/post/166659/


All Articles