Overview of tools for load and performance testing

As other brave people say: “From dev to prod is just one step”. Experienced people add that this step is called “testing”, and the most diverse, and we simply have no reason not to believe them.

The load matters: the driver of this truck managed to bring down the bridge with the weight of its vehicle, the recovery bill was about $ 21.3M. Fortunately, software testing is cheaper!
')
Of course, speaking of testing, you need to understand what and what we are fighting for. We deliberately limited ourselves and decided today to talk exclusively about load testing and performance testing: topics that are polarly distant from each other are extremely interesting in their most practical terms. Consider the tools for both, without being tied to any particular stack of technologies, so do not be surprised at the Yandex.Tank and BenchmarkDotNet neighborhood!

Stress Testing

Imagine that you and I wrote a certain service - now you need to understand how much pressure it will take. The old sad joke of developing high-quality software says that it is better to have software that works guaranteed badly than software that works well, but not guaranteed well: in the first case, we will at least know what we can count on. Further, if our service is able to scale in one way or another, then you need to understand how scaling turns out to be useful with increasing load and whether it performs the tasks assigned to it by the project.

Well, we take and direct the load on our child, while carefully observing the result: we are obviously interested in the situation when the service either begins to respond to requests with an unacceptable delay, or returns incorrect data, or stops altogether giving signs of life for all requests or just for their part.

Let's imagine that we wrote some service - for definiteness, we say that a web service, but this is not so important. To make sure what we can count on with it, we begin to “bombard” it with requests, observing the behavior of both the service itself and the load on the servers where it is spinning. It’s good if it’s clear in advance what requests we need to send to the service (in this case, we can prepare an array of requests in advance, and then send it to our application in one fell swoop). If the second request depends on the results of the first (a simple example - the user is first authorized, and the next calls to the service include information about the session ID), then the load generator must be able to generate test requests extremely quickly, in real time.

Taking into account the circumstances and our knowledge of the test object, we select the tool (s):

JMeter

Yes, good old JMeter. For almost twenty years now (!) It has been a frequent choice for many variants and types of load testing: a convenient GUI, platform independence (thanks to Java!), Multithreading support, extensibility, excellent reporting capabilities, support for many protocols for queries. Thanks to the modular architecture, JMeter can be expanded to the right direction for the user, implementing even very exotic test scenarios - and if none of the plug-ins written by the community in the past time suits us, you can take the API and write your own. If necessary, with JMeter, you can build, albeit limited, but distributed testing, when the load will be created by several machines at once.

One of the convenient features of JMeter is to work in proxy mode: we specify “127.0.0.1:8080” as a proxy in the browser settings and visit the browser with the pages of the site we need, while JMeter saves all our actions and all related requests in the form of a script that You can edit it later as needed - this makes the process of creating HTTP tests much easier.

By the way, the latest version (3.2), released in April of this year, learned to give test results to InfluxDB using asynchronous HTTP requests. True, starting just from version 3.2, JMeter began to require only Java 8, but this is probably not the highest price for progress.

The storage of test scripts in JMeter is implemented in XML files, which, as it turned out, creates a lot of problems: they are completely inconvenient to write with your hands (read, you need a GUI to create text), how inconvenient it is to work with such files in version control systems (especially during when you need to do diff). Competing on the field of load testing products, such as Yandex.Tank or Taurus, have learned how to form test files on the fly and transfer them to JMeter for execution, thus using the power and experience of JMeter, but allowing users to create tests in the form of more readable and easier stored in CVS test scripts.

Loadrunner

Another very well-known product on the market and in certain circles, the wider distribution of which was prevented by the licensing policy adopted by the manufacturer (by the way, today, after the merger of the Hewlett Packard Enterprise software division with Micro Focus International, the usual name HPE LoadRunner changed to Micro Focus LoadRunner). Of interest is the logic of creating a test, where several (probably correctly say “many”) virtual users in parallel do something with the application under test. This makes it possible not only to assess the ability of the application to process a stream of simultaneous requests, but also to understand how the work of some users who are actively doing something with the service affects the work of others. In this case we are talking about a wide choice of protocols for interaction with the application under test.

HP at one time created a very good set of automation tools for functional and load testing, which, if necessary, are integrated into the software development process, and LoadRunner can integrate with them (in particular, with HP Quality Center, HP QuickTest Professional).

Some time ago, the manufacturer decided to turn around to those who are not ready to pay for the license right away, and supplying LoadRunner with a free license (where a limit for 50 virtual users is entered, and a small part of the entire set of supported protocols is prohibited), and the money is taken for further expansion . It is difficult to say how this will contribute to increasing interest in this, undoubtedly, entertaining tool, if it has such strong competitors.

Gatling

A very powerful and serious tool (not for nothing that it was named after a rapid-fire machine gun), primarily because of the performance and breadth of support for out-of-box protocols. For example, where load testing with JMeter is slow and painful (alas, the plug-in support for working with web sockets is not very fast, which ideologically conflicts with the speed of the web sockets themselves), Galting will almost certainly create the necessary workload without much difficulty.

It should be noted that, unlike JMeter, Gatling does not use the GUI and is generally considered to be a tool for an experienced, “competent” audience that can create a test script in the form of a text file.

Gatling has some cons for which he is criticized. Firstly, the documentation could have been better, and secondly, to work with it, it's good to know Scala: Gatling itself, as a testing tool, and test scripts are written in this language. Thirdly, “sometimes” in the past drastically changed the API, as a result, it was possible to find out that the tests written six months earlier “do not run” on the new version, or require revision / migration. Gatling also lacks the ability to do distributed testing, which limits possible applications.

Yandex.Tank

In short, Yandex Tank is a wrapper over several load testing utilities (including JMeter), providing a unified interface for configuring, running and generating reports, regardless of which utility is used “under the hood”.

He can monitor the main metrics of the application under test (processor, memory, swap, etc.), system resources (free memory / disk space), can stop the test based on various clear criteria ("if the response time exceeds a specified value", " if the number of errors per unit of time is higher than x ", etc.). By the way, it can display the main statistical data of the test in real time, which is very useful right during the test.

The tank has been used in Yandex itself and in other companies for about 10 years. They are bombarded with completely different services, with different requirements for the complexity of test scenarios and the level of load. Almost always, for testing even heavily loaded services, only one load generator is enough. The tank supports various load generators, both written specifically for it (Phantom, BFG, Pandora), and widely third-party (JMeter). The modular architecture allows you to write your plug-in under the desired load generator and generally screw almost anything.

Why use different load generators? Phantom is a fast “cannon” in C ++. One such generator can produce up to hundreds of thousands of requests per second. But to achieve this speed, it is necessary to generate requests in advance and it is impossible (not possible) to use the data received from the service under test to generate the next request. In cases when you need to execute a complex script or service uses a non-standard protocol, you should use JMeter, BFG, Pandora.

In BFG, unlike Jmeter, there is no GUI, test scripts are written in Python. This allows you to use any library (and their huge number). It often happens that bindings are written for the service for Python, then it is convenient to use them when writing load scripts. Pandora is an experimental gun on GoLang, fast enough and extensible, suitable for tests using the HTTP / 2 protocol and will be used where fast scripts are needed.

Inside Yandex, a special service is used to store and display the results of stress tests. Now its simplified analogue called Overload is open - it is completely free, it is used, including for testing open libraries ( for example ) and holding competitions.

Taurus

Taurus is another framework for several load testing utilities. You might like this product, which uses a Yandex.Tank-like approach, but having a slightly different set of features, and, perhaps, a more adequate configuration file format.

In general, Taurus works well in a situation where power, say, Gatling is important for creating a test, but there is no desire or opportunity to deal with Gatling (as well as writing scripts for testing on Scala): it’s enough to describe the test in a much simpler Taurus file format, configure use Gatling as a load creation tool, and all Scala files will be generated automatically. So to say, “automation automation” in action!

Taurus can be configured to send test statistics to the BlazeMeter.com online service, which will display data in the form of smart graphs and tables. The approach is not very ordinary, but noteworthy: the report output engine is obviously improving over time, and will gradually display information even more likeably.

Performance testing

Testing the performance of a service or application can and should be done not only after the development process is completed, but also during it, just as we do regular unit or regression tests. Properly organized, regular performance tests allow us to answer a very “thin” question: have the recent changes in the application code led to a deterioration in the performance of the resulting software?

It would seem to measure performance - it's that easy! Twice take the timestamp (preferably with high accuracy), calculate the difference, add-divide, and everything can be optimized. No matter how wrong! Although this question sounds simple in words, in fact this kind of measurement is rather difficult to make, and it is generally not always reasonable to compare the results of various measurements. One of the reasons: to compare the results, the tests must pass over the same source data, which, among other things, implies the re-creation of the test environment during each test run, another reason - the comparison of the subjective perception of the test scenario runtime may be inaccurate.
Another reason is the difficulty of distinguishing the impact on the performance of the whole application of the work of its individual module, the one we are correcting. Compounding the situation, we specify: it is even more difficult to isolate this influence if a team of more than one developer is working on the code.

One of the approaches in such a situation is to thoroughly create a full-fledged test script that repeats work with the services of a real client, and runs it many times, with parallel analysis of server load, where it is being tested (thus, it will be clear how much of the script creates a load on individual test server resources, which can provide additional information on finding places where you should approach performance more seriously) - alas, you cannot always afford this in a real situation, just flow in that the volumetric test, and even povtoronny 10-20 times more likely to be too long to see him very often, and it's completely kill the idea.

The second approach, more suitable for the development process, is to organize a limited in scale, “micro” or even “nano” testing of individual code points (say, launching one method or one function, but a large number of times — that is, rather, benchmarking). Planning for such testing requires additional efforts on the part of the developer, but the result pays off and general improvement in code performance, and understanding how individual parts of the project behave as you work on them as well as on other parts. Here, for example, a couple of performance testing tools:

Jmh

JMH (Java Microbenchmark Harness) is a Java snap-in for building, running, and analyzing nano / micro / milli / macro benchmarks written in Java and other languages with the target JVM platform. A relatively young framework in which developers have tried to take into account all the nuances of the JVM. One of the most convenient tools from those that are nice to have on hand. JMH supports the following types of measurements: Throughput (measurement of net performance), AverageTime (measurement of average execution time), SampleTime (percentile of execution time), SingleShotTime (the time to call one method is relevant for measuring the “cold” start of the code being tested).

Since we are talking about Java, the framework takes into account including. and the operation of the JVM caching mechanism, and before launching the benchmark, it executes the test code several times to “warm up” the Java byte-code cache.

BenchmarkDotNet

BenchmarkDotNet takes on routine tasks when compiling benchmarks for .NET projects and provides wide possibilities for formatting results with minimal effort. As the authors say, there are plenty of feature requests, so BenchmarkDotNet has room to develop.

To date, BenchmarkDotNet is a library, primarily for benchmarks, and not for performance tests. Serious work is under way to ensure that the library could also be used on the CI server for automatic detection of performance regressions, but so far these developments have not been completed.

Google lighthouse

Frontend performance measurements have always stood somewhat apart: on the one hand, often delays are related to the backend reaction speed, on the other hand, users often judge the entire application by the behavior of the frontend (more precisely, by the speed of its reaction), especially when it comes to the web.

In the web frontend with respect to performance measurements, everything now goes towards using the Performance API and measuring exactly the parameters that are relevant to a particular project. The webpagetest.org web service with the Performance API tags and measurements will be a good help - it will allow you to see the picture not from your computer, but from one of the many testing points existing in the world, and assess the impact of the time of receiving and transmitting data via Internet channels to work frontend.

This product would be more suitable for checking the pages of the site for compliance with Google’s recommendations (and best practices in general) for both websites and Progressive Web Apps, if it were not for one of its functions: among the checks there is also a test for the site’s behavior in case of bad as a web connection, as well as in the complete absence of communication. This is not very correlated with performance testing as such, however, if you think about it, in some cases a web application is perceived as “slow” not because it prepares data slowly, but because its operating conditions on the user's machine, in its browser, considering its internet connection - alas, not perfect. Google Lighthouse just allows you to evaluate this effect.

Yes, the topic of benchmarks and testing is endless. About each of them can and should write a post, and not one. However, as we all know, the most interesting thing is not just to read, but to talk, listen, ask around a knowledgeable person who, by virtue of his experience, will warn in advance about the many small and major difficulties lying in the way of mastering this or that technology.

Therefore, we are pleased to invite you to attend the Heisenbag 2017 Moscow Conference, which will be held on December 8-9, 2017, where, in particular, the following reports will be presented:

Details and conditions for participation can be found on the conference website .

Source: https://habr.com/ru/post/337928/

All Articles

Overview of tools for load and performance testing

Stress Testing

Performance testing

More articles: