What is the most successful tool for predicting and evaluating future events you know? Coffee grounds? Tossing a coin? Or social question? This article describes a new way of assessing future events, which may in the near future be one of the most reliable prediction tools.
You will learn about the possibilities of exploring opinions in social media, the so-called “prediction markets”, and also who will win the Champions League final on May 28 at Wembley Stadium
From Condorcet's theorem to the prediction market
Is it true that the majority are not mistaken? And that the voice of the people is always true?
More than 200 years ago, the French scholar and politician Marquis de Condorcet answered his controversial question in his
jury theorem :
if the probability of predicting each independent individual is more than 50%, then the probability that the majority will give a correct prediction tends to be 100% increasing the number of predictive individuals.')
With the rise of social media and the ability of everyone (or every second) to express their opinions online, the Internet is becoming just as representative of a site for collecting statistics as public opinion polls. But much more simple, dynamic and relatively inexpensive in obtaining results and analyzing them.
And sometimes a narrow slice of people is easier to find precisely on thematic sites on the network than to find offline in an infinite space.
Sin was not to take advantage of such a wealth of information and opinions, and on the basis of Condorcet's theorem a new class of applications was created -
the prediction markets (SPM) . At the moment there are already a few hundred. At their core, such applications are a speculative market such as the stock exchange, whose members have a goal to make money on predictions. The higher your bet on a particular scenario, the higher your vote will be rated. We will cite only one bright fact proving the right of such applications to exist:
“In the United States, RP predictions turned out to be more accurate at any recent US presidential election than any public opinion polls and any expert predictions. The ER of MAPE of IEE was only 1.5%, against the Gallup Poll’s error of 2.4% (Gallup Poll was always famous for the most accurate estimates). ”The success of online predictions is near
One of the first to evaluate and test in practice is the voice of the blogosphere not through artificially created exchanges, but through third-party observation and information gathering, the American company General Sentiment tried.
Last spring, she monitored social and news media to determine the finalist for the popular American show American Idol (the progenitor of Star Factory). You can read the full study
on this link (there you can download the PDF of the study), we will outline only the main points.
In their studies, they operate on three indicators: Media Value, Sentiment and Volume.
The
Media Value indicator converts all references to a person into real value (dollars), corresponding to the amount that a person or brand would have to spend on traditional media channels (for PR-actions, events, paid articles and reviews, etc.) to generate a similar wave of discussions . The numbers turned out really serious
Sentiment is the tonality of discussions, which is calculated according to a certain author's Sentiment Index.
Volume is the total number of brand references.
The Americans laid out their research right before the finale of the TV show, in which two contenders came out - Crystal Bowersox and Lee DiUise. However, the Media Value was calculated for 7 participants, and we can see that from the very beginning the audience paid much less attention to all other participants than to the finalists Crystal and Lee (dates - from April to May 2010)

Calculating for the two finalists also the indicators Sentiment and Volume, the guys from the “General Sentiment” put on the success of Crystal Bowersox ... But in the end, Li DiWise won.
Of course, the case would be more elegant if the prediction came true, but this and a number of subsequent studies of the “General Sentiment” are bringing the era closer when the outcome of mass events can be predicted by analyzing opinions in the online environment.
But all this is Western research, we adopted a similar mechanic and decided to create something similar for the Russian-speaking Internet audience. And, of course, they could not refuse the temptation to look into the future and find out who will win in the most spectacular football tournament in Europe - the Champions League.
We analyzed the discussions of the Champions League final to find out who the majority of users were winning, and made a forecast based on it.
How it works?
Data collection was carried out on various types of online media, starting with forums and ending with online media.
In the course of the research, the main sources were identified by themselves, where the most conversations about the outcome of the match were conducted - these are football communities and portals, where extensive discussions took place in the comments:
http://www.eurosport.ru ,
http://news.sportbox.ru ,
http://www.championat.ru ,
http://football.ua.ua and a number of others
The search was conducted by keywords:
“Manchester”, “MJ”, “Mankuniantsy” , etc. in the context of such words as
“Barcelona”, “Barca”, “Badgers” ...
The most difficult stage was, of course, the analysis of the forecast tone - that is, the definition of who the author of a specific commentary puts.
For this, a large dictionary was compiled, which was constantly replenished (during this time we substantially replenished the personal lexicon!), For example:
“win”, “make”, “beat”, “merge”, “blow through” ... and many other analogues. However, the task in this case was really not an easy one, since the “living Great Russian” language in our forums is difficult to interpret automatically (for example, in some cases it would simply be an account in favor of one team or another), therefore a significant proportion of references was processed and evaluated manually.
The most popular comment sources with a forecast tone ratio are:

Integrally, more than one and a half thousand comments were collected and processed, in which the final of a football tournament was discussed, and less than a quarter of them contained a clear indication of the winner. As a result, approximately 60% (204) of the votes were cast for the victory of the Ministry of Justice, and only the remaining 40% (145) - for Barcelona.
Of course, we have many difficulties.First of all, this is a technical collection of information and processing. The Russian language is so rich that it is difficult to calculate all versions of references (although we tried), and certain references could slip away from us.
Secondly, if you re-read the Condorcet theorem, then the key indicator of the correct prediction outcome is the probability of a correct prediction of each greater than 50%. We believe that we were able to withstand this condition, since Mentioned were collected from specialized football fields, where people give their predictions not from scratch: that is, they watch football tournaments and know the strength of the teams
Thirdly, psychology could bring some error: Barcelona recently crushed everyone in its path and won quite a lot of tournaments, so many are simply tired of it and will root for Manchester United and bet on them.
In general, the methodology still requires grinding and improvement, but we have already run to the nearest bookmaker office :)
We suggest you also
to take part in the voting and tell right now who will be the winners of the European Champions Cup. All votes are accumulated on the Facebook page. Thanks for participating!
And the habrasoobshchestvo would like to know the opinion about the probability of a good forecast - and ask, what do you think, what disadvantages does this kind of research have? We will be grateful for any constructive feedback!
And to set the mood - a few examples of comments from the blogosphere:
- Kohl, maybe not with the score 3: 1, but Manchester United will take up in this final :) Fergie, though old, is a very cunning fox. Barca is certainly good, and this year, and in the previous one, but in character, they are far from Manchester United. In general, we will see, and you are preparing a bottle of wine, the benefit now is not even far behind it to go) ( link )
- BARS will wipe out the shots of MJ, as well as the sour team of the maula !!! Barca is a true champion !!! ( link )
- MJ win. Info 100% tnaya =) ( link )
- yo !!! I scored with mex at work for 20 bucks that the leopard will win Manchester ... waiting for the finals of the 28th ... ( link )
- I think a very interesting game will be, but the advantage on the side of the leopard will be 100 percent, so you need to resist the leopard to play in the counter-attack and not sit at your goal as the MU usually does. ( link )