Universal rating or every voice is important to us

The original image from the site caricatura.ru

Recently, I faced the task of sorting films by rating. Each film has 2 values - this is the average rating and the number of provers (just like at Kinopoisk and IMDB).

The first thought, of course, was about the Top 250 Kinopoisk with their formula. But even at the first glance, it seemed imperfect to me - an incomprehensible choice of the entry threshold and an overall average rating are introduced into a stupor, because under slightly changed conditions (fewer votes or votes), it works extremely poorly.

How I decided to specifically this task and in passing for any such ratings, read the article.

The essence of the problem

In theory - the higher the score, the more interesting the film and the higher it should go, but ...
The situation will best describe the examples.
')

Example №1

Movie №1
The number of voters - 2
Average rating - 10

Movie №2
The number of voters - 100
Average rating - 7

Yes, film number 1 is cool, but hardly anyone wants to focus on the opinion of two people. From this it follows that it is stupid to keep the calculation, based only on the average rating, and in the top we will always have “films for an amateur”.

There are a lot of low-budget films, and it’s easy to guess that the majority of them will have high enough ratings (viewers who are already interested, low-quality and looking for pros, not minuses, will watch them).

Film search solved this problem by entering the threshold of votes (M = 500) and giving chances to unadvertised films, bringing them closer to the average rating of all films - the lower the views of the film, the higher its rating will rise.

Where:

V - the number of votes for the film
M is the threshold of votes required to participate in the Top 250 rating (now: 500)
R - the arithmetic average of all votes for the film
C - the average rating of all films (now: 7.3837)

Example 2

Now consider the following 2 films and calculate their rating using the Kinopoisk formula:

Movie №1

V = 500 (number of votes per film)
M = 500 (threshold of votes)
R = 2 (average film score)
C = 7.3837 (average rating of all films)

Raiting = 500 / (500 + 500) * 2 + 500 / (500 + 500) * 7.3837 = 4.69185 (by the way, the formula can be immediately inserted into the calculator :-)

Movie №2

V = 1000
M = 500
R = 3
C = 7.3837

Raiting = 1000 / (1000 + 500) * 3 + 500 / (1000 + 500) * 7.3837 = 4.4612

A film with half the votes and a lower rating received a higher rating! Most likely, this approach is beneficial for Kinopoisku - it encourages voting and, as a result, in the infinite future gives greater objectivity.

But I didn’t have the task of calling users to vote, and this approach didn’t fit - after all, in each weight category at a discrete point in time there would be an unfair sorting. Yes, and the choice of the threshold is not obvious and subjective.

So, the task:
Make an honest top movies based on the number of votes and the average rating for the film without using constant values.

Decision

Okay, let's tell

For a long time putting the numbers back and forth, we (I and my wife Katya) were looking for the perfect formula, but everywhere we ran across the human factor. And the truth is, what is more important to count - the number of votes or rating and how to correlate them?

As a result, we decided to approach the issue not from a mathematical point of view, but from a philosophical one:

So, people vote more for the advertised film, usually a film with a large budget. The higher the budget, the higher the technical quality of the film.
Yes, in general, the assessment of the film is a subjective concept, but we are interested in the Top for all and, therefore, should focus on popularity (popular - common, generally understood).
The more votes for the film - the more objectively we consider the assessment.

As a result, we came to the conclusion that both the estimate and the number of votes are approximately equal in importance factors, and the decision came by itself.

To equalize these two values - you must bring them to the same scale. The maximum number of votes is always different, but the rating (oh yeah!) Is from 1 to N. To simplify, we take N = 10. Consequently, the task has been reduced to bringing the number of votes of the film to a percentage of the maximum possible among all films.

Then I will talk about the implementation of the approach on Mysql - since mathematicians have already solved the problem, and the rest, I hope, it is interesting to touch the ready.

So, create a table

CREATE TABLE IF NOT EXISTS `films` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `raiting` float NOT NULL, `count_votes` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Add 4 entries from examples

You probably already guessed that to calculate the percentage of votes we need the max function and the logarithm.

If not, everything is very simple. If we have already accepted the equivalence of the number of votes and the assessment and decided to bring the scale of votes to a scale from 1 to 10, then it suffices to use the logarithmic scale
ru.wikipedia.org/wiki/%D0%9B%D0%BE%D0%B3%D0%B0%D1%80%D0%B8%D1%84%D0%BC0D0%B8%D1%87 % B5% D1% 81% D0% BA% D0% B8% D0% B9_% D0% BC% D0% B0% D1% 81% D1% 88% D1% 82% D0% B0% D0% B1

So:

 select @a:=POW(max(count_votes), 1/10) from films; select id,name,raiting, count_votes, ((LOG(@a,count_votes))+raiting)/2 as actual_raiting from films order by actual_raiting desc ;

We get the root of the 10th power of the maximum number of votes for the subsequent calculation of the logarithm. So we get the share of the number of votes of a particular film from the maximum, reduced to 10. Add up with the average rating and divide by 2 - so we correlate them.

Welcome to the results:

A film with two votes has the least objective rating and is lower than the first two, but a movie with 500 votes is too bad. Movie number 4, despite the small number of votes, significantly ahead of rivals in the rating.

Thus, we have created weight categories of films (by popularity), and in each weight category we sort them by rating. Films with a roughly equivalent number of votes are ideally sorted by rating.

So, we got rid of the threshold and made the films play honestly, while giving a chance to the less advertised but cool films to get up in their weight category.

Chef's Compliment: Actor Rating

It's time to explain for the main image of the article. This is a cast rating.

After solving the first problem - here the solution was found instantly according to the same rules: the more an actor appeared in big films, the more popular he plays and the better he plays. Also, his game affects the rating of the film.

So, we took the maximum number of films from an individual actor, counted the number of films and their average rating for each actor and applied the same formula.

This is how Top Actors began to look.

For those who are interested to see the result: http://vk.com/droptv

Source: https://habr.com/ru/post/172065/

All Articles