About sorting content based on user ratings: Part 2

The last article attracted a lot of interest. And even, for a while, became the best in 24 hours. I had a few ideas and some of the questions in the comments need to be answered in more detail.

The problem of one voice vs "the rich get richer"

Let me remind you that the main problem is that if you count the rating of an article or product as the arithmetic average of user ratings (the easiest option), it is possible that an article with one vote of five points will be higher than an article with 100 votes of 5 points and one in 4p. We call this the “one vote problem”, although it exists not only for single-voice articles.
')
In order to avoid this, you need to take into account the number of votes. However, if we do this, we will get another problem “rich get richer”. Older articles will have more votes, their rating will be higher, they will receive more transitions and even more votes, and, therefore, they will be even more divorced from young articles. Even if all articles are added at the same time, this effect will still be observed. Only at the top will not be old articles, but those who are lucky at the beginning of the vote to get a random vote.

The more transitions occur with a rating, the stronger this effect. The paradox is that the more needed rating is, the worse it is. To solve both problems with a beautiful function will not work, you can only find a middle ground, so as to minimize the total effect of these two problems.

Although there are some "not smooth" solutions. For example, exclude articles with the number of votes lower than a certain one from the rating. However, in this case, part of the articles will be removed from the rating for a long time. If the main part of an article’s transition is obtained from a rating, then some articles will get into it only after a few years. In some cases, this effect is not acceptable.

Another option is to display ratings for a certain period of time. For example, in the last 24 hours, as in Habré. The rich will still be rich and the article with a few hours old has little chance of overtaking a 23 hour article.

Plus / minus and a sense of justice

In the rating of plus / minus the number of votes is implicitly included. The sum of the pros and cons linearly depends on the number of article views. As already said, there are no “problems of one voice” in this rating. However, the “rich get richer” effect in it should be observed stronger than in most cases solving the “one vote” problem for other types of ratings. However, this is not happening ...

Most users are honest and try to help the site. Bullies are much smaller than respectable people. This is the philosophy of Wikipedia and that it works easily to be convinced simply by opening Wikipedia.

The user will most likely add an article that is undervalued, in his opinion, than a plus article that he liked, but in his opinion, is ranked correctly in the rating. A minus “overvalued” article is also more likely than a minus article on the “right” place.

You can see the issue of Habr for the last 24 hours, from the point of view of mathematics, almost all of its articles should be about 24 hours old. But it is not. There are no completely young articles in it, but articles just 3-5 hours old are often the first to appear. The mechanism of self-organization works.

For stars, this mechanism also works, but much worse.

Statistical error

If we are trying to overcome the problem of one vote, we need to calculate a certain value, call it “statistical error” and, in the simplest case, subtract it from the rating of the article. The question is how to calculate it. Even if we know the distribution, its coefficients, then the error, depending on the confidence we need, can fluctuate over a considerable interval. So, in any case, the estimate of the error is subjective. For example, it is impossible to be 100% sure that a liter of juice ± 100ml will be poured into a juice bottle. The machine can turn off and not pour anything at all, the probability of this, of course, is small, but not zero.

In experimental physics, it is generally accepted that the random measurement error decreases with increasing number of experiments, as the root of their number. However, I must say that from the point of view of mathematics, this is true only for a normal distribution, and the results of the voting, sometimes, differ greatly from it. However, this method will produce a good result for any cases, later I will explain why this happens.

Here is our error. Sigma is the standard deviation (hereinafter SKO). In other words, the root of the sum of squared deviations. This is a measure of the scatter estimate. If we subtract it, we get some lower bound on the rating score.

There are problems here. First, you can calculate it in the old rating only if you have memorized all user ratings separately. The second one is that for an article with one voice, the standard deviation = 0, and for articles with a small number of votes, the standard deviation will be determined with a statistical error.

The easiest way to solve these two problems is to consider the deviation from a certain percentage of the article’s rating.

Where Ri is article rating. Ri with a dot is the resulting rating. Ri without a point, the original rating is the average of all votes. N is the number of votes.

Where k is from 0 to 1. With k = 0 the case will degenerate to the arithmetic mean, with k = 1 article with one voice will have zero weight. k is a measure of conservatism; the higher it is, the richer get richer faster, but the effect of one voice is less. The problem is to find a balance, therefore, in many cases the value of 0.5 as a middle will be justified.

This method well solves the problem of “one voice”. At the same time, for a large number of votes because of the root, its growth slows down reducing the effect of "the rich get richer." To reduce the fine by 10 times, you need to increase the number of votes 100 times. Therefore, this method can be used not only for normal distribution.

Substitution

Compared with the formula from the previous article (medium weighting), it is less conservative with a large number of votes. In other words, the effect of "rich get richer" with a large number of visits to the article will be weaker. However, this formula has disadvantages. It is not clear that it reflects, the previous formula was some assessment of the rating of the article in reality. Another problem is that the article's rating can be lower than the minimum rating, with k = 1 and n = 1, the rating is zero when, as a minimum, it is usually 1.

By and large, in this formula we took part of the rating of the article, which was considered unreliable, and removed it, replaced by zero. If the rating comes from one, then you need to replace it with a unit. However, if we replace it with the average rating of all articles, then our result will be some estimate of the rating that the article will receive in the future, and not its lower limit. What is more correct and our rating will make sense. There is almost no sense in comparing the lower bounds, and comparing the mat. waiting (forecast) - is. In addition, this will reduce the effect of "rich get richer" for young articles. Initially, young articles will not be at the end of the ranking, but in the middle. After all, an article with no votes is almost certainly better than an article with a hundred minimum marks.

.
R with a dash is the average rating of all articles on the site. The formula replaces part of the article rating, which is similar to the share of the average article rating. By this we reduce the impact of not only positive assessments, but also negative evaluations of a young article.

This is some averaging of the rating of an article with an average rating. Now I will prove that this is an arithmetic average weighted average of the article with the average rating of all articles with coefficients of 1-k / sqrt (n) (assessment of the reliable part of the rating) and k / sqrt (n) - (unreliable part of the rating of the article).

The value of the weighted average is always between the minimum and maximum values of the elements. Those. The final rating will always be in the required range (for example, from 1 to 5 for 5 stars). It is always between the “simple rating” and the average article rating.

Our formula is undefined at n = 0 and we will take the average rating of articles for its value. As a result, the formula will look like:

If this article turns out to be quite simple for perception, then I will continue it and talk about how to improve the assessment considering the RMS and about the rating in the “I like” style and when the formula on glass is still applicable.

PS if someone has a base with several thousand votes. And the voices are remembered separately. That is a huge request to share. Several databases will allow you to bring numerical indicators of forecasting.

Read the sequel

Source: https://habr.com/ru/post/150808/

All Articles

About sorting content based on user ratings: Part 2

The problem of one voice vs "the rich get richer"

Plus / minus and a sense of justice

Statistical error

Substitution

More articles: