📜 ⬆️ ⬇️

Methods for assessing the quality of the forecast

Often in the preparation of any forecast - forget about how to assess its results. Because as often happens, there is a forecast, and there is no comparison with the fact. Even more mistakes happen when two (or more) models exist and it is not always obvious which one is better, or more accurate. As a rule, one digit (R 2 ) is difficult to manage. As if you were told - this guy walks in a blue T-shirt . And everything at once became clear to him)

In articles on forecasting methods, I constantly used such abbreviations or notation when evaluating the resulting model.

I will try to explain what I meant.

Leftovers


So, in order. The main value through which the forecast accuracy is estimated is the residuals (sometimes: errors, error, e). In general, this is the difference between the predicted values ​​and the original data (or actual values). Naturally, the more remnants the stronger we made a mistake. To calculate the comparative coefficients, the residuals are converted: either they are taken in absolute value or squared (see table, columns 4,5,6 ). In their raw form, they are almost never used, since the sum of negative and positive residues can reduce the total error to zero. And this is stupid, you know.

Severe MSE and R 2


When we need to fit the curve to our data, the accuracy of this fit will be evaluated by the program using the mean squared error (MSE) . Calculated by a simple formula

where n is the number of observations.

Appropriately, the program, calculating the fitting curve, seeks to minimize this coefficient. Squares of balances in the numerator are taken precisely for the reason that the pros and cons are not mutually destroyed. MSE has no physical meaning , but the closer to zero, the better the model.
')
The second abstract value is R 2 - the coefficient of determination . It characterizes the degree of similarity of the original data and the predicted. Unlike MSE, it does not depend on the units of measurement of the data, so it is comparable. The coefficient is calculated using the following formula:

where Var (Y) is the variance of the source data.

Of course, the coefficient of determination is an important criterion for choosing a model. And if the model correlates poorly with the original data, it is unlikely to have high predictive power.

MAPE and MAD to compare models


Statistical methods for evaluating models like MSE and R 2 , unfortunately, are difficult to interpret, so the bright minds came up with lightweight, but convenient for comparison factors.

Mean absolute deviation (MAD) is defined as the quotient of the sum of residues modulo the number of observations. That is, the average modulus. Conveniently? It seems so, but it seems not so. In my example, MAD = 43. Expressed in absolute units, MAD shows how many units on average the forecast will be wrong.

MAPE is designed to give the model even more visual meaning. The expression is expressed as the average absolute error in percent (mean percentage absolute error, MAPE ) .

where Y is the value of the original series.

MAPE is expressed as a percentage, and in my case it means that an average of 16% may be wrong in the model. That, you see, is quite permissible.

Finally, the last absolutely synthetic quantity is Bias, or simply offset . The fact is that in the real world, deviations in one direction are often much more painful than in the other. For example, with conditionally unlimited warehouses, it is more important to take into account jumps in real demand up from predicted values. Therefore, cases where residues are positive refer to the total number of observations. In my case, 44% of the predicted values ​​were lower than the original ones. And you can donate other evaluation criteria to minimize this Bias.

You can try it yourself in Excel and Numbers

It is interesting to know - what methods of forecasting quality assessment do you use in your work?

Details on the blog

Source: https://habr.com/ru/post/19657/


All Articles