Rating based on history

Formulation of the problem

Context: there is a chess tournament with a fairly large number of players of different levels.
Decisions were made: not to break players into well-defined groups (we don’t know each other yet, it’s not clear who to put where), not to make a tournament “for departure” (many beginners, they’ll just feel hurt after flying out the first set). More or less (manually) we cope with the choice of a partner of approximately the same level.

Task: make a rating system based on the results of the tournament. Since we are not playing “for departure”, there is no final. Considering the number of points is not serious because of the heterogeneity of players. That is, the rating system should be such that winning the weakest player or losing the strongest player should practically not affect our rating.

Theory

We chose the maximum likelihood algorithm : if we determine the probability of each of the possible outcomes of the game as a function of the ratings of two players, then we can calculate the probability of each result that has happened. Multiplying these probabilities, we get the probability of the realization of the whole tournament. Then you need to “only” maximize this function of several variables in order to get the most likely ratings of all players.

The following 2 paragraphs can be skipped if you are only interested in the programmer part. Their essence boils down to the selection and calibration of the probability function — the programmer can assume these functions as data. Mathematical geeks, it will probably be interesting to delve into the functions themselves.
')

Function selection

Since we didn’t collect any fundamental research, the probability function was chosen for reasons of ease of implementation and minimal likelihood - a normal distribution .

Let the probability of winning player A for player B be proportional to the distribution function of the standard normal distribution (sorry for the tautology) at the point of difference between the two ratings. That is, with a rating difference of 2 points (2 standard deviations), the probability of winning a stronger player is over 95%.

By calibration: the choice of the average is the only correct one, the choice of the standard deviation is not important, since it determines only the scale; all ratings can be multiplied by an arbitrary coefficient, which corresponds to the choice of a different standard deviation. And there are no more parameters for the normal distribution.

Draw and calibration

Now you need to take into account the possibility of a draw, determine its probability as a function of two ratings, and cut the previous function by this value.

We decided not to engage in complex modeling of a draw - considering our level, we were surprised that there was one draw in the tournament. Therefore, they took all the same normal distribution, and put the probability of a tie proportional to the density distribution of the normal distribution at the point of difference between the two ratings. That is, the probability of a draw does not depend on the level of the players, only on the difference in their levels.

Calibration is more complicated here: first, the choice of the standard deviation (relative to the standard deviation of the previous law). Put equal. Secondly, the proportionality coefficient, which determines the probability of a draw by the same players playing. Having estimated this probability at 20%, we calculated the proportionality coefficient - 50.

Practical implementation

Excel

Excel does an excellent job with finding the maximum function - Solver . The problem is that the value of the function (the probability of the realization of the tournament) is quite small (with the same rating, we start with a probability of 1E-120), and with these values, Excel's search stopping algorithm is buggy. By definition, it does not stop at the maximum, but “close to”, and this proximity criterion worked awfully on our data - depending on the initial rating, and even on the order of the players in the table, the results were different.

We decided to go to the system, where we have a little more opportunities to influence the rules of calculation.

Matlab

MatLab also has the functionality to find the maximum function. For greater stability of the method, and in order to better see what is happening, we decided to maximize using the iterative method , variable by variable. That is, all ratings are fixed except for one, the probability function becomes a function of one variable, we find the maximum, we fix this rating, we release the next, etc. In several passes, the function is maximized with respect to all variables.

In practice, ten passes were enough.

Stability check

We made several stability tests - if we set other initial values, if we mix the order of the players, etc. The method is quite stable. In the end, left two calculations: the players in alphabetical order and the players against the alphabet. After ten iterations, the results are compared, if the two methods give a different order to the two players standing next to the table, we postulate that they have divided one place in the table.

Rake

As a joke, I will describe one rake encountered. In MatLab, the extremum search function is looking for not a maximum, but a minimum. The first reaction is that instead of the maximum probability of the realization of the world, we will look for the minimum of the probability of no realization.

The problem is that the 1E-120 in a normal variable is held without problems, but 1 - 1E-120 is confidently rounded to one (120 characters of the mantissa will not withstand any variable), and all calculations go down the drain.

The obvious solution is to go to a negative value and look for its minimum.

Lack of model

The biggest drawback of the model is that it is difficult to explain to the players. More precisely, not even the model itself, but the fact that your rating may change as a result of parties in which you did not participate.
Let me explain with an example: you have won player X with a rating of 1. Based on this fact, the system will assign you a rating slightly higher than 1, we put 1.1. Suppose you no longer played in the tournament, and X lost a bunch of games, and his rating dropped to −1. Your rating drops with X rating, since this is your only reference.

This example is quite simple; in practice, ratings dance in a difficult and predictable way.
The theoretical rationale for this behavior (or rather, the difference from the usual behavior) of the rating is as follows:
1. If we consider a rating as an absolutely accurate representation of a player’s strength, and a change in rating reflects a change in strength, then yes, our rating can only change as a result of our games, and the calculation formula should take into account the ratings of our opponents at the time of our match with them.
2. If we consider the rating as an imperfect approximation to a certain ideal value, and the change in rating is an improvement in our assessment, taking into account new data, then the rating of all players can change as soon as new data about our world becomes known.
From which a strong restriction of the model directly follows - they do not take into account the evolution of the player, the absolute, “true” rating is assumed constant throughout the tournament.

Source: https://habr.com/ru/post/134720/

All Articles