Forecasting in the gaming industry. Part 1: All About Prediction

I predict the future. No, I'm not a predictor. I am a data processing specialist. Sounds suspicious, doesn't it? In fact, no one is able to predict the future - this is not Spielberg's “Special Opinion”. But the probabilities and scenarios by which events unfold are quite real.

I already wrote about data and the use of analytics in games and raised the issue of predictive analytics. In this and subsequent articles I will tell about it in more detail.

Big Data analytics has gone far ahead, and now you can really fairly accurately predict user behavior. Therefore, I can conduct research using methods of sociology and computer science.
')
Forecasting in the gaming industry means the ability to predict the player’s actions to develop strategies, improve the product, save valuable players and apply new ways to monetize.

But I'm running a little ahead. Let's first look at the basics of predictive modeling, and also figure out how and why it works.

If you do not go into details, to understand the principle of predictive modeling is quite simple. As an example, take computer games. Imagine that you are recording and tracking everything that happens in a game, and over time your computer begins to pick up patterns. Some scenarios are repeated, some - no. When the program detects repeated scenarios, it “learns” them and waits for repetition, and then makes predictions about the next scenario.

For example, if the ABCD script repeats time after time, at some point the program will begin to recognize it. Once this happens, you can give the system a command to make a prediction: when the ABC script is executed, the program predicts which element will appear next (in our case, of course, it will be D). In addition, the program can calculate how accurate this forecast is. In place of D, there can be any element that is important for design, user retention or monetization. Imagine that D is quitting a game, completing a level, or buying at a store.

How is it possible and why is it important? Assume that the sequence ABC does not always follow the element D, and we periodically get the sequence ABCX. That is, the more data the algorithm can process, the more accurately it determines the probability and the more fully it can answer the question of how often the prediction came true.

That's all. This is the prediction. If we talk about the accuracy of such forecasts, then it is ensured by the use of scientific methods. There are several tests that data processing specialists use to test predictive models. Remember that testing means rather “yes, this model worked”, and not “I can predict the future and know that tomorrow it will work”. If nothing changes, why shouldn't she work tomorrow? Failure occurs when “tomorrow” is different from the intended scenario. If a terrorist attack happens or the school year ends, our assumptions may turn out to be wrong, and therefore we will not be able to predict further events. And if the end of the school year can still be inscribed in a recurring scenario, it usually does not work with a terrorist act.

One common method of validation is cross-validation. It works as follows: the program splits a huge array of data (game data, for example) into two equal parts. Then she takes the first part, analyzes it, searches for repeated scenarios and builds a model. As a result, we get the formula: the ABC scenario, after which the element D follows in 75% of cases. After that, the program checks the accuracy of the model relative to the second, raw half of the data. If in the second part ABCD scenario also occurs in 75% of cases, we can assume that the forecast turned out to be quite accurate. In fact, we are confident in this forecast by 75%, that is, 75% is the degree of confidence.

Why not 100%? The statement that it is impossible to be 100% sure reminds of the statistics course at the university. But, if we are talking about forecasting accuracy, the reasons for doubting the absolute accuracy are somewhat different. As we try to predict human behavior in the real world, we have to take into account some factors that the program cannot take into account. For example, we can see that John buys big coffee in the morning, and assume that he will do the same thing tomorrow. It's simple, right? But what if something happens that prevents him from buying coffee? For example, poor John get into an accident? We could not suggest it and certainly did not put such a possibility in the model, but this does not mean that our model is wrong. It is simply incomplete, that is, not 100% accurate.

There is one more reason why we do not consider the predictive models to be 100% accurate: every pseudo-scientist doesn’t cost anything to deceive the system. For example, they include in their forecast all users without exception, not taking into account false positive and false negative results. This is contrary to the scientific approach - anyone can include all players in the model, and then predict that tomorrow each of them will make a purchase of $ 10. Since the forecast applies to the entire audience, the accuracy of forecasting which players make the purchase will be equal to 100%. However, predicting the behavior of most players will be incorrect.

Such cases cast a shadow over predictive analytics and are much more common than they should be. Therefore, for confidence in the accuracy of the forecast, we need proof, which can serve as the "F index", although there are other metrics that are worthy of attention. The calculation of this index is made as follows: indicators are taken, adjusted for false positive and false negative results, and then the average value is derived from them. The result is expressed as a percentage and has a high degree of confidence.

It is impossible to fake it, so it cleans well the results obtained by fabricated prediction. Games are a good source of data, but a high F index can only be obtained by knowing which variables to include in the analysis. By the way, programmers usually do not really understand this. Sociologists understand, but they usually do not have a complete understanding of the technical aspects. Thus, the most correct decision is to combine both approaches, which is what successful specialists are doing. For example, in the telecommunications industry, a good F index is 40%. In the gaming industry, you can achieve a better result, because we have more visual data. A good is considered an index of 50–70%, higher - it’s almost unbelievable. Remember: the index F is obtained by averaging the two indicators and is not a net percentage. In fact, this is a very approximate indicator.

Ultimately, the degree of confidence is a key element of the predictive model. You need to know how confidently you can make decisions based on your forecast, taking into account the economic model of the project and the degree of risk. The advantage of these indices is that they give access to scientific indicators - transparency and provability. No one will believe your word. Let the data speak for themselves. Listen to them - and you can see the real situation, not the desired picture.

Source: https://habr.com/ru/post/269637/

All Articles

Forecasting in the gaming industry. Part 1: All About Prediction

More articles: