Akinator and Mathematics

On Habré, the Akinator theme has already surfaced several times, including with the tag I don’t know how it works . I stumbled upon it recently and, of course, was delighted. Then, like many others, I thought, “But how does it work?” I did not find the answer to this question, and therefore I set myself the goal of writing a program similar in functionality, having figured out what was happening.

Functional requirements

The first step is to find out what the words “similar in functionality to a program” actually mean. Thinking a little, I highlighted the following requirements:

The program must be trained. It is clear that you can not teach the program to recognize several hundred thousand characters by manually entering the answers to the questions for each of them. Rather, it is theoretically possible, but we will look for more beautiful solutions. The obvious alternative to this approach is to learn on the go, using the answers of users. This is our program should be able to.
The program must forgive errors. Obviously, users' opinions on the answers to some questions can vary considerably. A simple example: a desperate homophobe and a simple man guessing Brad Pitt. To the question “Is your character sexy?” The first one will probably answer in the negative, while most people do otherwise. However, this small discrepancy should in no way prevent our program from finding out the truth.
The program should choose the questions wisely. There are many strategies that determine the questions that need to be asked. For example, you can ask all the questions at all (only a lot of them). You can ask random questions, but even then, most likely, the answer will have to go very long. And you can try to choose the next question so that you will find out as much information as possible when answering it. That is what we will try to do.

Algorithms

If it were not for the forgiveness of mistakes, it would have been quite easy to achieve the desired. For example, it would be possible to store a tree of answers to questions, in which internal vertices correspond to questions, and sheets - to answers. The process of the game would then look like a descent from the root to one of the sheets. However, this algorithm will not cope with the forgiveness of errors. Yes, and issues of balancing the tree arise.

In a sense, wood is a very “mechanistic”, “machine” way of playing, extremely unstable to the slightest inaccuracies. We need to play like a rational person would play. Those who are more or less familiar with the theory of probability should know that it has the so-called Bayesian interpretation , as well as the Bayesian approach based on it. The basis of this approach is the description of knowledge using distributions of random variables with the subsequent transformation of a priori knowledge into a posteriori based on observations using the famous Bayes formula . Moreover, this approach is the only generalization of the classical algebra of logic in case of uncertainty (you can read about it, for example, here ). This leads many scientists to believe that the Bayesian approach is the standard of rational thinking. Well, we just need it. Let's try to apply it to our problem.

Bayesian model

So, we recall the Bayesian formula: P (A | B) = P (B | A) P (A) / P (B). And now in words. Suppose we need to evaluate the probability that event A occurred, provided that event B exactly happened (that is, we were guaranteed to watch it; this is why B is often called observation). According to the Bayes formula, this probability is proportional to the product of the other two. The first of these, P (B | A), is called likelihood and shows how likely B is to occur, provided that A. A second factor, P (A), is the so-called prior probability of A, that is, the probability that it will happen in principle (regardless of B). In fact, this probability reflects the information that we knew about A before we learned about what happened B. In the denominator of the formula, there is also a value P (B), which in this case just plays the role of the normalization factor and can be ignored.
')
Using this formula in the context of the question game is pretty easy. Let's assume that Ai is an event of the form “you have made an object i”, where i can be either Spock or the Virgin Mary. Since B is an observation on Ai, it would be natural to assume that B consists of answers to questions. The only option that I see here is to present B in the form of a joint event: “A1 was answered to question Q1, ..., Ak was answered to question Qk”. Then P (Ai | B) will show for the object i the probability that it was he who thought it (taking into account that the user gave answers to k questions). This is exactly the value that interests us. If you select an object with a maximum value of P (Ai | B), you can, if the value of P (Ai | B) is large enough, try using it as a guess.

The prior probability P (Ai) can be considered as a special case of P (Ai | B) with k = 0. In other words, this is the probability that the player made an object i, provided that no questions were asked, and we don’t know anything at all. On the one hand, it would be possible to give all objects equal to P (Ai), since it's fair. On the other hand, Barack Obama will most likely be thinking much more often than Holden Caulfield. Therefore, all other things being equal (that is, when we cannot distinguish objects), Obama should be chosen. Consequently, the natural estimate of P (Ai) will be the ratio of the number of games, when X was made, to their total number.

Likelihood P (B | Ai) also receives a convenient interpretation. Only before you need to use one little trick - to assume the conditional independence of answers to the questions under the condition Ai (somewhat crude, but very convenient for us to simplify). Translated into Russian, this means that, by hypothesis, the probability P (B | Ai) can be written as a product (by j) of the probabilities P (Bj | Ai), where Bj is an event of the form “Aj was answered to the question Qj”. P (Bj | Ai) in this case will be the ratio of the number of times when, when the object i was thought of, Qj was answered by Aj by the number of times when, when the object i was thought of, in principle, the question Qj was asked. In order to avoid zero and uncertain probabilities, I propose to additionally assume that initially each of the questions each of the answers was given once for each question. That is, if the question Qj has never been asked about the object i, P (Bj | Ai) will be 1 / Nj, where Nj is the number of answers to the question Qj (I, by the way, used one and all the same 4 answer choices: “yes”, “no”, “do not know” and “the question does not make sense”).

Let's sum up the intermediate result. We found a simple formula that maps a set of question / answer pairs and some entity into the probability that it was this entity that was made with the given answers to the questions. Having recalculated this probability for all objects in our database, after answering a new question, we can see which of them are more similar to the proposed object at the moment. Moreover, the training of our model is implemented quite simply: you just need to store information for each entity in the database about what questions were asked about it and how many responses each type gave to users. After each game, this information can be updated based on the user's responses. Also, to take into account the “popularity” of a person, you need to store in the database the number of times that a person was made up.

Question selection, information and entropy

Well, it remains only to understand which questions are best to ask. Naturally, you need to ask those questions that give more information. But can we somehow measure this information? It turns out that yes. For this you can use the concept of informational entropy . Speaking roughly, but understandable, then informational entropy is such a characteristic of the distribution of a random variable (measured, like information, in bits), which shows how far we are not sure how much this random variable will take. For example, if a random variable takes the value 1 with a probability of 0.99, and the value 0 - with a probability of 0.01, then the entropy of such a distribution will be very close to zero. If the random variable takes, for example, the values of 0 and 1 with equal probabilities of 0.5 (heads or tails), then the entropy of such a random variable will be equal to 1 bit (this is just the amount of information we need to get to eliminate the uncertainty).

Okay, let's choose every time the question, the answer to which will most strongly reduce the entropy of the distribution P (Ai | B), which is exactly responsible for our knowledge of who the player has thought of. Another problem immediately arises here: generally speaking, different answers to the same question can reduce entropy in different ways. What to do? It is proposed to find the question for which the expected decrease in entropy will be maximum. The expected decrease in entropy shows how much "on average" entropy decreases if we ask a question. In order not to write here a few more paragraphs of the text, I will give a formula by which this value can be calculated. Those who want to easily understand why she has this look. So, every time you need to ask such a question j, for which the value of H [P (Ai | B, <Qj, Yes>)] P (<Qj, Yes>) + ... + H [P (Ai | B, <Qj, No>)] P (<Qj, No>) is minimal. Here, H [P] denotes the entropy of the probability distribution P, and by "<Qj, Ans>" - the event "the question Qj is answered by Ans". The value of P (<Qj, Ans>) can be easily found by the formula of total probability , summing it, due to all known objects. That is, P (<Qj, Ans>) = sum (i) P (<Qj, Ans> | Ai) P (Ai | B).

It turns out that this approach allows you to quickly reject irrelevant questions, focusing on the most important. In a sense, this method is a generalization of the method of "dividing in half" in a probabilistic formulation. See how it all works together, you can on the video below.

Total

I hope the information from this short article seemed interesting to one of you. In fact, first of all this article should draw your attention to the power of (albeit the simplest) mathematics in solving poorly formalized problems. Using the Bayesian approach to probability theory and information theory in your programs can make them more rational, but at the same time more humane :)

Source: https://habr.com/ru/post/84364/

All Articles