📜 ⬆️ ⬇️

The forecast of the execution time of typical consecutive "heavy" calculations in a multiprocessor system in conditions of unpredictable receipt of applications

The idea of ​​writing the first article on Habr arose in the process of “collecting stones” to create a server system, the main tasks of which are:



A number of circumstances should be clarified:



The author’s initial attempts to predict the computation time had poor accuracy. For the "by eye" algorithm, the dependence of time on N was selected using power coefficients so as to minimize the average deviation. As a result of such test calculations, not very pleasant results were obtained:
NReal time (s)Estimated time (sec)
4179.605.04 ± 0.96
68926.2018.33 ± 3.50
116578.4070.56 ± 13.48
2049264.60300.34 ± 57.40
3001654.20799.24 ± 152.74
3341875.801,052.50 ± 201.13
47692323.602,621.69 ± 501.01
54493424,103,690.10 ± 705.18
61294811.804,988.97 ± 953.40

To help, as often happens, came the classics. The words “interpolation” and “extrapolation”, long forgotten, were recalled, and searches on the Internet began in suspicion of the correctness of the chosen direction. A number of posts, articles and other resources were re-read. The main conclusion is as follows: to solve the problem of predicting the values ​​of a certain unknown function of one variable, the Chebyshev or Lagrange polynomials are most often used. It turned out that it makes sense to use the Chebyshev polynomials (in principle, they could be replaced by the OLS) in cases where there is no confidence in the existing “argument-value” set, and Lagrange polynomials are good in the opposite case. The second looked like what was needed. Indeed, why not create “ideal” conditions for testing the computation algorithm on a specific system (select the core with a load of 100%) and not collect enough statistics to construct the Lagrange polynomial?
Graphic example of the construction of the Lagrange polynomial

Meanwhile, there are already implemented algorithms for constructing the Lagrange polynomial, for example, here . The algorithm adapted for the task began to show similar (modulo deviations) results when extrapolating on an increasing set of statistics (for the forecast with N = 1165 real time was used with N = 417 and N = 689, for the forecast with N = 2049: N = 417, N = 689 and N = 1165, etc.):
NReal time (s)Estimated time (sec)
4179.60-
68926.20-
116578.4055.25
2049264.60253.51
3001654.20617.72
3341875.80863.74
47692323.602899.77
54493424,102700.96
61294811.805862.79

When interpolating with the Lagrange polynomial, the results turned out to be impressive starting with N = 2049 (one sample was successively excluded from the sample and the calculation of the excluded measurement was made):
NReal time (s)Estimated time (sec)
4179.60-
68926.202.58
116578.4098.35
2049264.60245.74
3001654.20664.15
3341875.80862.91
47692323.602407.75
54493424,103251.71
61294811.80-

Indeed, at N = 689 and N = 1165, the deviation of the predicted value from the real value was, respectively, about 90% and 25% , which is unacceptable, however, starting from N = 2049 and further the deviation was in the range of 1.5-7% , which can treat as a good result.
')
Polynomial: F (x) = -5.35771746746065e-22 * x ^ 7 + 1.23182612519573e-17 * x ^ 6 - 1.14069960055193e-13 * x ^ 5 + 5.44407657742344e-10 * x ^ 4 - 1.39944148088413e-6 * x ^ 3 + 0.00196094886960409 * x ^ 2 - 1.22001773413031 * x + 263.749105425487
The degree of the polynomial, if necessary, can be reduced to the desired one by easily changing the algorithm (in principle, for the author’s task it was possible to manage with the 3rd degree)
Graphs of this polynomial

It is important to note that the polynomial is interpolation, and therefore, for values ​​outside the [N min ; N max ] range, the quality of the forecast leaves much to be desired:



Conclusion: it makes sense to collect enough statistics in advance in “ideal” conditions (especially for large N values), to build the Lagrange polynomial and, upon receipt of applications, to obtain the forecast of the time of their calculations by substituting N i into the finished polynomial.

This approach is extremely understandable, however, when we speak about a pro-processor (multi-core) system, the situation becomes more complicated:


Analysis of these aspects showed that it is impossible to predict the execution time of the application in advance, but this value can be adjusted upon the occurrence of certain events affecting the current computing resource allocated for the calculations, followed by the issuance to the user. These events are: the arrival of a new task and the completion of the task being performed . It is upon the occurrence of these events that it is proposed to carry out a forecast correction.

image

It becomes clear that if any of these events occur, it is necessary to recalculate the allocated resource for each task (in cases with a known distribution of resources between tasks) or obtain this value in any way using the built-in (language or operating) tools, then recalculate the remaining time taking into account the allocated resource. But for this we need to know how much work has already been done by the process. Therefore, it is proposed to organize some storage (possibly a table in a DBMS), which will contain the necessary information, for example:
Task idStartthe endLagrange (N)DoneLast requestLast correctionCurrent CPU
245712:46:11...Lagrange (N)0.45324...12:46:451.00
657412:46:40...Lagrange (N)0.08399...12:46:451.00
262312:44:2312:46:45Lagrange (N)1.00...12:46:450
........................

It is supposed to implement the following algorithm (metacode):


Conclusion: having an advance forecast for “ideal” conditions, we can practically “on the fly” correct the predicted percentage of “heavy” consecutive calculations when the load on the system changes and give the user these values ​​upon request.

Specific implementations may have their own characteristics that complicate or simplify life, but the purpose of the article is not to analyze delicate cases, but only to offer the above approach to a fair trial of the reader (primarily on the subject of viability).

Source: https://habr.com/ru/post/258791/


All Articles