📜 ⬆️ ⬇️

Analysis of software performance using mathematical experiment planning

“Premature optimization is the root of all evil”
Anthony Hoar

Greetings to all users of Habr!
This article originated as a useful by-product of my scientific research. I would be happy if the ideas outlined below seem interesting and useful to you, and even better if they get their application and further development in real-life projects.

Software performance (software) is an important aspect in the development of any software product. The urgency of the issue is due to the ever-increasing complexity and significance of software. Particular attention is paid to performance:
A thorough analysis of the performance of a software product can significantly reduce the cost of the equipment itself and the cost of maintaining performance, increase the loyalty of software users.
The concept of performance in terms of software means either productivity or reactivity :
In this paper, reactivity will be considered as a measure of software performance. Such a choice is not fundamental and the only possible, and was made only to simplify measurements in experiments. Consideration of productivity requires a more complicated test procedure, since measurements of the volumes of information processed are not always possible. Estimation of data processing time by the system is much easier to implement, incl. and automate the testing process.
When analyzing software, the most obvious and logical task facing the researcher will be an increase in productivity. Formally, we have an optimization problem, and in the context of this study, the problem of minimizing the time of processing incoming information by a software system. Thus, the optimization criterion is a certain function.

where (1)
The choice of the entire set of factors affecting program performance is not a trivial problem. As a rule, modern software is an extremely complex system with a huge number of links and dependencies. And it's not just the complexity of the software itself. Any program has a runtime programming language (runtime), operates in the operating system, interacts with other services and software — all of these elements are also separate complex systems with no fewer internal connections and dependencies. If we also add mutual dependencies of factors (which can be not only paired and / or linear), then we get an incredible number of various combinations of factors.
For example, consider a program that works with a database (DB) and produces statistical processing of the information stored in it. Examples of factors affecting performance include:
The possibility of dependencies between these factors is obvious. For example, increasing the amount of RAM can reduce the need for frequent access to the hard drive, and, consequently, reduce the influence of the corresponding factor.
')
To find the optimal values ​​of influencing factors, i.e. To solve the optimization problem (1), we can offer the following options:
All of these options have their advantages and disadvantages. Using a full search it can be argued that the desired parameters will be found, however, with a large number of factors and variants of their values, the number of all possible combinations may be too large, and the experiments will take too much time. The number of all possible combinations of N can be found by combinatorial rules:

where (2)
Then, with 5 factors and 5 possible values ​​of each factor, we get combinations.
With a random choice of the optimal combination, there is a high probability that the solution obtained will be very far from the global optimum.
Analytical study of the system is often difficult or impossible when analyzing existing products without source code. In addition, such an approach requires a full understanding by the researcher of all the algorithms used in the software, connections and dependencies of the components.
Special software tools such as profilers [5] provide only some statistical information about the execution of the program code: the number of method calls, the average execution time of methods, etc. Optimization in this case comes down to identifying the so-called. Bottlenecks and optimization of the algorithms used. Such an approach is quite popular, but it does not allow one to obtain the desired solution to problem (1) .
Mathematical models of software performance analysis were repeatedly considered by foreign and domestic authors. So in [1] , [2] , original approaches to solving this problem were proposed at different stages of software development.
Summing up, it should be noted that the main disadvantages of the proposed methods of solution are:
To overcome the above difficulties and solve the problem, it is proposed to use an apparatus developed in the theory of mathematical experiment planning (MBE).
Let us dwell in more detail on the question of the applicability of the MBE to software performance analysis. One of the main ideas of experiment planning is the use of a black box for the object under study cybernetic abstraction [3] (see Fig. 1 ).


Fig. 1. Abstraction black box.

Such an abstraction implies a refusal to consider the internal mechanisms of the phenomenon or object under study due to its great complexity. Analysis of the phenomenon is reduced to the analysis of input parameters affecting the object (factors) and output characteristics (responses) [3] .
For the applicability of the MPE is necessary to comply with a number of conditions [3]:
After analyzing the analogies, we can conclude about the possibility of considering the software system as a black box. Then, with the fulfillment of the above requirements, the entire existing MPE apparatus can be used to study the performance.
As an example, and to refine the method of using MBE for analyzing the performance of a software system, consider a web application that is part of the Professional Clubs project [4] .

Stage 1. Analysis of a priori information.

When analyzing a priori information about a software product, it became known:
As a mathematical model of the experiment [3], we will use the simplest one — a linear model without dependencies between factors. Most likely, such an assumption and such a combination of factors do not correspond to the actual situation, but may well be used to test the method itself. Then this task is reduced to finding the values ​​of the coefficients. regression equations

. (3)

The factors with the highest values ​​will have the greatest effect on the output response.

Stage 2. The choice of influencing factors.

Table 1 presents a set of influencing factors selected as a result of analyzing a priori information about software.

Table 1.
FactorDescription
MySQL key_buffer_size
A DBMS tuning parameter that determines the amount of memory allocated for storing indexes [6] .
MySQL table_cache
DBMS tuning parameter that determines the number of constantly open database tables [6] .
MySQL query_cache_limit
DBMS tuning parameter that determines the maximum amount of memory for caching query results [6] .
Caching data by the application.
It can be either on or off.
Web server and PHP interpreter launch method.
The application under study can be launched in two ways:
  • Apache web server + PHP module;
  • web server nginx + php-fpm.

Stage 3. The choice of the upper and lower levels for the factors.

Now you need to select the upper and lower levels for each factor [3] . Hereinafter, we will use the notation adopted in the IPE:
Table 2.
FactorTop Level (+1)Lower level (-1)
265 Mb.16 Mb.
30064
64 Mb.1 Mb.
Cache is enabled.The cache is turned off.
Nginx + php-fpm.Apache + mod_php.

Stage 4. Drawing up a planning matrix and conducting experiments.

Table 3 presents the results of a series of experiments.

Table 3.
NoyNoy
one-one-one-one-one-one6,90945690217+1-one-one-one-one6.956250343
2-one-one-one-one+16,26592088518+1-one-one-one+16.27117213
3-one-one-one+1-one1,046864681nineteen+1-one-one+1-one1,049605346
four-one-one-one+1+10,95928777720+1-one-one+1+10.960128005
five-one-one+1-one-one6.92249123821+1-one+1-one-one6.94905457
6-one-one+1-one+16,29213854122+1-one+1-one+16,288483698
7-one-one+1+1-one1,04732769323+1-one+1+1-one1,048429732
eight-one-one+1+1+10,95917846424+1-one+1+1+10.959984639
9-one+1-one-one-one6,94782815925+1+1-one-one-one6.944574752
ten-one+1-one-one+16.26996142126+1+1-one-one+16.281574535
eleven-one+1-one+1-one1,04703259527+1+1-one+1-one1,047937875
12-one+1-one+1+10.96007624428+1+1-one+1+10,960813348
13-one+1+1-one-one6.95416094329+1+1+1-one-one6,952602925
14-one+1+1-one+16.278223336thirty+1+1+1-one+16.284795263
15-one+1+1+1-one1,04801948331+1+1+1+1-one1,047952991
sixteen-one+1+1+1+10,96055920632+1+1+1+1+10,960591927

Stage 5. Analysis of the results.

The coefficients of the regression equation (3) can be found as


. (four)

The results are presented in table 4 and fig. 2

Table 4.
3,973612110.05197110.02450410,0034686-2,9182692-0,2483070



Fig. 2. The coefficients of the regression equation.

Stage 6. Conclusion.

From fig. 2 that the main contribution to the time of web page generation is made by the factor that characterizes data caching in the application. Negative regression coefficient means that this parameter reduces the black box response function. In our example, this means a reduction in page generation time.
Strictly speaking, such results are obvious, since the inclusion of data caching by the application entails a minimal number of queries to the DBMS. Thus, the effect of the DBMS settings becomes insignificant. The obtained results confirm the possibility of using the MBE in relation to the analysis of software performance.

In this article, I tried to consider the possibility of applying the existing method of mathematical planning of experiment for a long time as applied to the analysis of software performance. Such an approach makes it possible to overcome a number of difficulties arising in the analysis of software systems, to identify the factors that most strongly influence the analyzed characteristic, to reveal the dependencies between the factors.
Of course, the presented methodology does not claim to replace the existing methods of performance analysis, for example, profiling. However, there are a number of tasks in which the use of such a technique can significantly facilitate the tasks of a researcher.

Thanks for attention!

Bibliography

  1. Dubakov S.A. Information technology analysis of performance in the software development process: dis. Cand. tech. Sciences / Dubakov SA - Tomsk, 2005. - 135 p.
  2. Moiseychuk LD Development of models and methods for analyzing software performance based on strictly hierarchical stochastic Petri nets: dis. Cand. tech. Sciences / Moiseychuk LD - St. Petersburg, 2002. - 152 p.
  3. Adler Yu.P., Markova EV, Granovsky Y.V. Planning an experiment when searching for optimal conditions. - M .: Science, 1976. - 279 p.
  4. LLC "Professional Clubs". - URL: http://prof-club.ru/ . Appeal date: 25.09.2011.
  5. Kaspersky K. Technique program optimization. Efficient use of memory. - SPb .: BHV-Petersburg, 2003. - 464 p.
  6. MySQL Server System Variables. - URL: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html . Appeal date: 25.09.2011.

Source: https://habr.com/ru/post/129346/


All Articles