The complexity of load testing - an interview with Vladimir Sitnikov (Netcracker) and Andrey Dmitriev

On the eve of the Heisenbug conference, we talked about the intricacies of load testing with Vladimir vladimirsitnikov Sitnikov (he has been working on performance and scalability of Netracker OSS for 10 years - the software used by telecom operators to automate network management processes and network equipment, Java and Oracle Database productivity issues) and Andrei real_ales Dmitriev (java-programmer, developed the JDK in the company Sun and Oracle, led the development team for Android in QuickOffice. In the company Netcracker led the group dealing with load testing the OSS platform (Java, OracleDB, JMeter, etc.)).

JUG.ru: Please tell us about your work and the role that load testing plays in it.

Andrei Dmitriev: Until recently, I provided quality solutions for Netcracker in terms of performance, endurance and scalability.
')
Netcracker is a platform implemented by many telecom operators, almost 100% written in Java. It was created for a long time and works with a huge number of third-party hardware and software. Therefore, it is possible to experiment with completely different cases - network performance, frontend, SQL queries, and the like. That is, we do not just have one case, which we all time solve. Cases are always new, so interesting.

My task was to find problems in Netcracker solutions. When a problem was discovered, I carried out a superficial analysis, supplied it with additional details, removed the necessary metrics (did the dumps) and, relatively speaking, transferred the task to the performance engineer.

Vladimir Sitnikov: I am a performance architect. My task is to analyze the results of measurements, as well as to develop recommendations on which measurements are best carried out in order to see the problems of one or another class. To solve problems, it is usually necessary not only to carry out these measurements, but also to understand the results. Therefore, I am day-to-day doing what I look into the cloud of numbers and trying to extract some useful information from it.

It is worth noting that it is not always the problem and the task comes with tests. Sometimes you have to deal with incidents that occur in real combat systems. In this case, having a look at the problem from two angles (from the test environment and production), it is necessary to quickly reproduce and correct it.

JUG.ru: With which questions, in your opinion, should load testing begin? Can you highlight some of the most important?

Andrei: In my opinion, there are no unimportant questions here. Each question can have a key influence on the testing and development process.

However, one should always begin with what the customer expects from the system being developed. To do this, an analysis of the existing system or model that the customer has is being carried out: what data it will migrate, which scenarios to run, which data to use, on which hardware it will work.

But our reality is that we begin testing as soon as we receive the first information about the project. Naturally, if we do not have binaries, we cannot test anything. But we can analyze and prepare data, make plans. Information about the hardware on which the solution will work, about the architecture and the data involved, about the scenarios is collected incrementally, and with each new piece of information we adjust our work on preparing and conducting testing.

It never happens that we collected all the data and tested the next 3 months without any deviations from the plan developed at the start. Always make some comments.

For example, we took a decision, started testing it, for which we asked the IT team to deploy a default project (as we usually deploy). After deployment, we conducted the first iteration of testing, and suddenly it turns out that the approach was completely wrong, because instead of two nodes, we had to build a cluster. As a result, we completely demolished the entire test environment and reconfigured the piece of iron in accordance with the new information we received.

The complexity of testing enterprise applications

JUG.ru: Pitfalls when testing enterprise-applications - please name a couple of the most obvious.

Andrei: These are not pitfalls, this is a whole minefield.

I have already said that each element in the test can be of key importance, it can radically reverse our understanding of the testing process, change or completely cross out the results. Accordingly, there is a rake at every step.

The first and most frequent is a mistake in numbers. For example, the customer may not understand what his load is. He makes some basic model that we get at the entrance to make a test plan. But at some point it turns out that the numbers were not calculated quite accurately, or we interpreted these numbers incorrectly. For example, there is a case of some service, and he has a subcase of the same service with an addition.

The customer considered that all this is one case, and in fact the user launches these cases in different ways (that is, cases get into different statistics). It turns out that the customer can assume that the case is being executed 100 thousand times, and in fact users create it 200 thousand times.

The second point: very often we are faced with the lack of all necessary rights for testing on the customer’s equipment. And here it begins ... We need to set up a framework for testing, for this we need to get 100,500 approvals, then we need to get access to the database, etc. etc. Without the necessary rights have to dodge.

For example, we went to the customer and conducted a load test that used 500 thousand of some units from the database. As part of the test, we connected directly to the customer database, and turned on the units that they were used. Suppose a total of 600 thousand, and 500 thousand. We have already "burned." And we have to invent a way to return 500 thousand of these objects to their original condition. If we don’t think about it beforehand, then we will have only one chance to do this load testing.

When we worked in offsight on our machines, we ran SQL nickname, which simply returned this data to its original state. Unfortunately, this SQL nickname in the onsite did not work for the customer, and we had to invent a different way.

Vladimir: I want to note that the problem of interpreting the figures, which Andrew mentioned, is very strongly connected with the correctness of the data on which we are testing, with their completeness and the essence itself. When we test some kind of narrow functionality and create an insufficient load (or even apply the load altogether on the left and wrong), the results of our measurement can be trusted with great reservations. I would say that one of the key testing problems is creating the right amount of data to properly manage the loads.

JUG.ru: What actions need to be performed before the start of load testing?

Andrew: The very simple answer is functional tests.

Second, I would note that you need to make sure that the hardware and solution are ready for testing: all interfaces are configured correctly, there is no parasitic load and no one does anything on this machine. By the way, we have special scripts for this that run before the tests and tell you if everything is ok or if you need to correct something.

Instead of an example: some time ago we rolled back an Oracle database to a specific checkpoint. We carried out the test, removed the characteristics and returned the database to its original state using rollback. Well, at some point we forgot to perform the last step. When we then started the tests, they began to fall. And it was very difficult to get to the real reason, because the information that we did not roll back is very difficult to track.

JUG.ru: How much detailed information about the application do you need to have before planning your testing?

Andrei: Indeed, requirements can be collected indefinitely. We will receive the final version of the requirements as soon as the end user switches to operation. But by this time it is too late to make any measurements - everything is already in production. Thus, our task is based on data that is not exactly known to conclude that certain testing will be useful.

Moreover, if we are not talking about grocery work, but about project work, no one will wait for anyone. You have 2 weeks to collect the requirements - you need to use these 2 weeks to start at least writing tests. Subsequently, as new information becomes available, requirements will be updated, and tests will be rewritten.

There is such an example. Imagine that the customer says: "We will have 3 thousand cases, but the most popular ones are 10 to 20." We begin to test them: we write tests so that these cases are generated 100-200 thousand times a day. After running the test, it turns out that these top cases do not cause a heavy load: we see that the CPU is at zero, the hard disk is almost not loaded. And then it turns out that among these 3 thousand cases, there are actually scenarios that are called only 300 times a day (and we, of course, ignored them), but these are some hard searches that can generate the lion's share of the load for example, on a hard disk.

This example well illustrates that not only quantitative, but also qualitative assessment of cases is important, i.e. great information about the project.

Vladimir: Andrew touched on a very interesting underwater rake - how to properly assess the complexity of the operation. There should be some kind of expertise (developer or analyst's instinct), which will suggest where potentially problematic scenarios are and where not.

You can maximize the details of the project, but no one will work with this. Load testing, at least in enterprise applications, has limitations, so you need to somehow isolate the most important details for testing. How can we do this? We can either follow what the customer says (“it’s very important that we press this button for 2 tenths of a second, because it will be pressed by the vice-president on the demo”), or follow some developer’s or analyst’s suggest that such a functionality is hard to implement, so it is necessary to better test it.

Accordingly, in real projects we are faced not only with numbers in relation to cases, but with some expert assessments of the complexity of these cases.

JUG.ru: What role do hardware glitches play in load testing?

Vladimir: In my opinion, we rarely have situations when the equipment does not work as expected. You can say a lot about the fact that the memory and hard drives periodically break, but with us it happens infrequently. And, as a rule, if the server breaks the hard drive, we lose the stand and, accordingly, the time to restore or build the server from scratch.

Andrew: In addition to the loss of time, there may be a loss of results. Worst of all, when in the middle of the test configuration has changed. We spent the first measurement, and here hop - the config has changed. And the next measurement gives a different result, and it is unclear whether this is a piece of hardware or the solution has changed.

However, it happens much more often that we are not confronted with failures, but with the limitations of the equipment - with the fact that the piece of iron cannot cope with the load. It seems to me that this can be dealt with using two main approaches. The first is the distribution of the load on the hard disk over time (for example, we do not perform any operation during the day, but shift it to night mode when the user does nothing with the system). The second is stupid query optimization.

JUG.ru: What is the " recalculation " in terms of data, in your experience, you need to lay in with load testing?

Andrew: Usually we lay 120-150% of the base load. But if we expect some components to be weak, i.e. We are not ready for this kind of workload, or if we have reason to believe that the customer’s assessments are lying, we can increase the “overpayment”.

And we sometimes check for 400%. Usually we do not test the entire system by 400%. We select and test components to check how stable they work. It is much easier to try (test, and then test) one small component, if we believe that it will be a potentially weak link.

JUG.ru: How often does this over-charge save?

Vladimir: Often or not - difficult to measure. Do you often need a spare parachute?

But I will say differently. It often happens that some kind of abnormal situation occurs in the “combat” system, for example, one of the components is frozen, data is stuck in it. Then it unlocked it, all this data got further. In an ideal world, of course, there are limiters everywhere, but in reality it happens that such sticking easily leads to a flurry of some messages - an increased load on all other parts of our stack. Of course, we can say that we do not support this, so the system fell. But it does not look very professional. It would be better if the system will somehow work out or, at least, not go into the unconscious.

Choosing between speed and quality

JUG.ru: Better testing takes more time, but time is usually limited. How to find the right balance of speed and quality? How to increase the speed of testing?

Vladimir: You need to test "from here until dinner."
Regarding increasing the speed of testing. In my opinion, there is a fairly simple answer: it is necessary to break a complex system into pieces and test all of the components first. This gives acceleration. When there is some kind of system, we have many, many scenarios, but they never come to testing at one moment. As a rule, we get the pieces, so first we test these pieces independently, then, when everything is more or less rolled back, we build tests that involve many components at the same time.

Andrew: Continuing the conversation about the components ...
There is such a thing as stubs of third-party systems - stubs. When we test a solution outside the customer's environment, we do not have real hardware and real services that issue phone numbers, names, IMEI, etc. We synthetically emulate all this with the help of “stubs”, which we call either emulators or stubs. Usually we start testing with these stubs. Stubs - this is also part of the project, they must also withstand a certain load. They may answer in monosyllables, but they are also written using our internal framework. And this framework may also have some performance problems. Therefore, we begin testing with stubs, loading them 40-50 times more than they will be loaded on tests. Then it helps us not to bother with this question on the end-to-end tests.

JUG.ru: Are there typical problems in analyzing the results obtained (roughly speaking, we looked at the “good” data incorrectly)?

Vladimir: All problems with analysis are reduced either to the fact that the data are not collected, or to the fact that they were not collected (ie, the load was submitted not in the way it was originally necessary). For example, this happens when they do not look at the percentage of erroneous scenarios. Those. when the work time is measured, not paying attention to errors (and the error may have a completely different work time and a different load).

Andrei: I guess I agree with Volodya. Rather, it is not a question of analysis, but of data preparation. Usually, the data is still either incorrectly collected or not enough.

For example, we spent some kind of migration, received a success in the report. Conventionally 60 thousand objects were promoted. Not knowing the specifics of this business process, we took and drew in the report: “Everything is ok.” But in fact it turned out that these 60 thousand objects produced some kind of errors. All that had to be done was to look at the required table or, in a simpler case, go to a special taboo and see if the operation was successful. But this was not done, in the end, the measurement is not considered valid, it can not be used to analyze the speed of the application.

JUG.ru: Synthetic data against natural for stress testing: let's define the terminology, what is considered synthetic data, and what is natural?

Andrei: It seems to me that an analogy with a car is appropriate here. There are complete natural data (mineral oil), it is 100% synthetic, and sometimes some kind of mix. Testing is also the same - very often some kind of mix of synthetic and natural data is made, i.e. the basis is taken by the data provided by the customer (he, of course, masks the data that is considered private and threatening his intellectual property), and on their basis we can generate additional objects or increase the nesting of existing ones. Depending on how much we generated, the percentage of synthetics will be higher or lower.

When there is no input from the customer, we really generate completely synthetic data. Most often this happens when the solution is written from scratch and we have to figure out how this data may look.

About the importance of natural data

JUG.ru: Is it possible to single out situations when it is natural data that is important?

Vladimir: I did not come across this. But if we expand synthetics to external systems, then there may be difficulties. It is not always possible to correctly emulate an external system (some kind of piece of hardware, etc.). To withstand all the features - time delays, the number of simultaneous connections, etc. - it is hard. Probably, you can make some kind of emulator, but it is often easier to take a ready-made external system than to try to evaluate these moments.

Andrei: Indeed, we write stubs, they return a very primitive answer. But there are systems that return such huge XML that they can break a leg. And instead of synthetically generating such a sheet, it is easier to take real data.

Vladimir: We can give an example: when we integrate with the mainframe system, it is easier to take a really working mainframe (not production, but a test one - its copy) and work with it, than create a bunch of stubs, drive some data into them, etc. . The system, written, relatively speaking, 20 years ago and more, now no one understands.

Transferring the question to another plane - you can always replace natural data with synthetics, but there are real limitations, for example, development timeframes and resources that will be needed (money and time). And there are cases when it is easier to take a ready-made system and copy it than to try to describe its logic.

JUG.ru: Are there "inappropriate" natural data?

Vladimir: Yes. There is a simple example associated with the search.
Search is one of the typical problem areas. Having the data itself is half the trouble, you need to know what search options will be used. In particular, if we constantly look for the same thing (for example, John Smith), then our system will get used to it, cache it, and everything will be fine. If we constantly randomize, search for different things altogether, then our system will be a little out of itself (we can apply a more complex load than in the combat system). All these subtleties greatly affect the measurement results.

It should be noted that the fact that we copied the same data does not mean at all that our load test will produce the same answer as the production-system.

And all because the data even after the export-import by some means of the base or something else will fall differently on our hardware.

If the real system worked for us for years, the data there was gradually created and deleted, the historical records in it would be mixed with the new ones. After exporting and importing this server, it may turn out that all data will lie next to each other tightly and compactly. As a result, all the scripts will work much better than they actually work on production. The most susceptible to this, again, are search scripts, but the effect is also observed in other scenarios. To get rid of this inconsistency, one must go in the direction of repetition of the environment itself: not just copy the data itself, but repeat the byte to byte.

If we are talking about an Oracle database, you need to remove a clone of the files, not the data itself. And we have cases where the test environment at the customer was deployed from a clone of the database — not at the level of the data itself, but at the level of the database — precisely in order to repeat these scenarios for load testing.

JUG.ru: How to check the "quality" of natural data? Do you need some kind of flair?

Vladimir: There should always be an instinct. But there is another way - to compare tests with the results of work on production. If we have a production script that is slow, we can repeat it on our test environment and see how much it runs there. So trite.

Andrew: From myself I want to add that natural data is data that is taken at a specific point in time. And they can rot.
There is a famous project in which I worked for almost 2 years. And we took the natural data from 2008 or 2007. During this time, these data have become completely irrelevant. But since There was no other way, and there is still no, we use this natural data to generate more relevant synthetics.

JUG.ru: What problems do you encounter when using a copy of corporate data as a basis for "natural" tests?

Andrei: First of all, this data is somehow tied to existing users. ID , ID «».

: — email- . production, — , . , / .

— . , / , , .. . Those. , , , .

JUG.ru: . ? ?

: , , . , — . What does it mean? , . . , — - . , - first name / full name, , . , , , , - .

JUG.ru: ? - ?

: .

? , . , , , - (, ), 100500 .

. , . - (, -). , , 500 . 5 . , . , , . 500 — , - , , . , , — , , . , ? Those. , .

: , — , , , . , . .

— , , .

JUG.ru: ?

: , — , , , .

: , .

. / . ( , ). , , . - , , , , .. - . .

— , , , . , . , — . , ( , , ), .

JUG.ru: , ?

: — . , . , — . / , ( SQL- ).

, , 10 «Radisson ». .

:

Source: https://habr.com/ru/post/314346/

All Articles

The complexity of load testing - an interview with Vladimir Sitnikov (Netcracker) and Andrey Dmitriev

The complexity of testing enterprise applications

Choosing between speed and quality

About the importance of natural data

More articles: