Create a cloud for software testing

While companies like Google and Microsoft are actively telling a simple user about the happiness that awaits them on cloud services, I want to share the other side of the clouds - happiness for software developers and testers. In the few years during which I have led the Parallels Plesk Panel testing team, a good collection of live hacking has been compiled for using the cloud for our purposes.

I am sure that this experience will be useful to the overwhelming majority of companies and startups. First, you can create a test cloud yourself . This is important when your budget is limited. Secondly, the test cloud in its most initial version can be deployed on 2-3 servers. Thirdly, all efforts associated with the creation of a test cloud are more than compensated by the automation of the testing process. This is especially critical if you regularly update and if the code for your project is rather lengthy.

With a preamble like everything. Curious invite under the cat.

Why do we need a cloud?

The Parallels Plesk panel costs about 50% of all servers used for hosting in the world. These are millions of hardware and VPS, on which hundreds of millions of client sites revolve. The price of bugs in the software is huge, there are no trifles. In this regard, Plesk is a very complex product for QA engineers. For it we need to run about 2000 p0 and p1 regression autotests per day (of which about 700 tests are dedicated to the user interface) in 60 configurations. In 24 hours, more than 120,000 auto-test launches are received. In addition, at least once a week, more work is underway - tests are necessarily initiated to check upgrade / backup / restore / migration from the seven supported versions of Splash on dozens of configurations. Once a month, QA engineers conduct performance, density and load testing. Well, it is quite hot on the eve of release. Then the need to verify integration with dozens of products is added to all the above operations.
')
Obviously, doing it all by hand, preparing the environment, installing products, uploading frameworks and tests, running on separate servers, collecting and analyzing logs and results is not even difficult - it is impossible in principle. That is why we decided to feed these tasks to the cloud.

What will it be?

The “Wishes” of the SpA QA-team (and indeed of the Parallels development center in Novosibirsk, where a number of other products are being created for service providers) were formed in 2008 and were packed into four laconic points:

Reliability and fault tolerance. As further practice showed, cloud uptime for QA turned out to be almost the same important criterion as uptime for public cloud services.
High performance. The criteria here is the ability to carry out the full amount of testing for the required time, the speed of creating a virtual machine (read VPS).
Scalable. The ability to allocate the necessary amount of resources at any time, based on the complexity of the task. For example, to solve a simple task, we will single out one machine and get the result in an hour, and to solve a complex task, we select 20 machines to also get the result in an hour. Plus, extensibility. A new fully configured server prepares no more than one hour.
Support change requirements. Having the ability to quickly adapt the cloud to changing requirements (changing types of virtualization, the ability to prioritize tasks between projects, etc.).

A year later, the "cloud", which allows you to perform basic tasks (deploying the machine, running autotests), was ready. 2-3 people worked on its creation. Virtually all cloud software is self-written, with the exception of a pair of out-of-box products. These are Munin (monitoring resource availability) and Nagios (monitoring resource utilization). But TestLink (storage of verbal descriptions of test cases, the formation of the results of manual and automatic execution of test cases) had to be seriously shoveled. The core and reporting system have been modified. For example, the latter gave us an acceleration of generating a report by two orders: it was 5 minutes, it became 5 seconds.

A literate reader will ask why we didn’t use the Amazon cloud and didn’t rivet instances in it for our goals and objectives. I will answer thesis:

We have no shortage of hardware. It was accumulated for many years, it was logical to load it in full.
Even now, external clouds do not fully meet our requirements, among which there are many specific configurations and types of virtualization.
Expensive. The daily volume of resources consumed in carrying out our tasks in terms of banknotes will be simply astronomical.

Moment of truth: how has the cloud changed the lives of testers?

Before the advent of the cloud, we unfolded several configurations with handles, filled in fresh product builds, installed, filled in auto tests, ran them, periodically checked to ensure that they did not fall, collected performance results from all machines, analyzed. It was long, difficult and very inefficient.

How come? The cloud saved the QA-team from routine tasks. The tester has ceased to be “a little admin” and is no longer engaged in self-rolling the OS, product builds and test plans for servers. Now the robot is engaged in this, which can be commanded through the GUI or API interfaces.

Here's what it looks like:

Select the product and version (1), the name of the test plan (2), build (3), the list of configurations (4), press the 'Run' button (5) and rushed. Deploying dozens and hundreds of machines, installing a product on them, launching auto tests takes place in just a few clicks.

In the Scheduled tasks tab, you can add tasks for regular automated testing using our task manager, which allows you to add tasks taking into account the load on the cloud.

And in the tab Mass execution there is an opportunity to launch groups of test plans. A couple of clicks - and you run half a million autotests. You can get enough of their own greatness.

Under the hood of the test cloud

This tricky moving picture illustrates the work of our cloud and gives an idea of its architecture:

Let's see how the gears are spinning inside using the example of continuous integration scheme.

The initiator of the process start are the developers. After they have built another feature, they commit to the version control system.
After that, the build process starts from the source.
As soon as a build is made for one of the configurations, a request is immediately sent to Test executor to launch Build Verification Test (BVT) - a test plan containing 10–20 autotests, checking the main functionality of the product and ensuring that there are no blocking problems.
Test executor sends a request to the Deployment server to create the appropriate environment.
Deployment server selects a pre-prepared image of the operating system on OS dump storage ...
... and expands it on the most suitable server or group of servers that are selected by the Load balancer according to certain rules (we will look at some of them later). After this, the assembled build is installed on all the created machines.
As soon as the Deployment server prepares all the necessary environments, management returns to the Test executor, which runs the BVT test plan on the prepared machines, distributing the autotests to them.
Depending on the nature of autotests, servers with external services can be used: for example, Selenium, external mail or databases.
If the test plan was executed with errors, then the developer who made the unsuccessful commit receives a notification that everything is gone and you need to quickly repair it. If the test plan was executed without errors, then the builds of the built builds are shifted to a special Product builds storage server, from where builds become available for manual and full-scale automated testing.

There are a few more nodes on the diagram that I did not mention. Inside the cloud is the Tests storage, on which test frameworks and autotests are stored, and the Test specification system, on which the TestLink is located. Outside, there are the Tools management server, which provides interfaces for working with the cloud (for example, the launch form of test plans mentioned above), and the Infrastructure monitoring server that allows you to control the cloud, quickly detect problems in it, find bottlenecks in the cloud, etc. d.

Of course, the cloud in this form did not appear immediately. It evolved (and continues to change) in accordance with the objectives and our ideas about its effective use.

Lifehacks and improvements

One of the most significant life hacking of our test cloud is the reporting system. Its advantage over conventional TestLink is visibility. TestLink is only good when tests pass without errors. Otherwise, TestLink is practically useless. Take a look at the screenshot below.

80 drops are 80 bugs in a product? Are these 80 problems in the tests? Is this one bug in the product that causes 80 tests to fail? What is the condition of the product? What exactly broke? Where to see the logs? Many questions and few answers.

Now the results are reported to the LogTracker system - our internal development.
The screen shows one of the pages of this system, containing information about the results of the execution of one of the testplans on a particular build.

In the upper left part, statistics on each of the configurations, the number of successfully and unsuccessfully passed autotests, the number of known errors are shown. Clicking on the configuration, you can get information about each of the autotests, logs of different levels of detail, screenshots, linking to known bugs in the system of bug tracking and much more.
At the top right, a list of autotests is displayed, sorted by the number of crashes in different configurations. In conditions of limited resources, this allows engineers to focus on “correct” problems (a more likely bug in a product or a problem in an outdated autotest caused by changes in the product).
Information about known errors, the number of crashes and linking to entities in our system of bug tracking is displayed at the bottom left (there are also bugs in the product and bugs in the tests).
The bottom right shows general statistics for this test plan and build, including information about unknown errors and errors cured by "quarantine". We call quarantine the automatic restart of dropped tests in a completely fresh environment. This allows us to weed out "false" crashes of autotests, caused by instability of the network or problems with the work of selenium-servers under heavy loads.

Please note: the entire array of data with which the QA-engineer will work is collected and generated automatically .

Conclusion

If you get to the conclusion, you can be congratulated. You are really interested in the topic of using cloud in QA and you may well want to build your own test cloud - at least on two or three servers. Here are my recommendations to those who will do it for the first time:

Be clear on your cloud requirements. Understand what tasks the cloud should help decide what criteria of performance and stability it should meet, etc.
Decide on the necessary resources, based on the requirements for performance and stability: what should be the hardware and network device, what software, virtualization tools, and what human resources you need
Develop a system of load distribution between servers and determine the criteria by which the choice of a suitable server will occur. To be honest, I did not consciously include the chapter on load balancer in order not to completely scare readers by the size of the publication. If it is interesting to read about the balancer - write in the comments, I will make a separate post
Create a cloud monitoring system that allows you to quickly find problems and bottlenecks in the cloud
Create a cloud management system (initially it may be just an API, and then you will need a GUI that allows you to use the cloud faster and require less technical knowledge)
Choose a test management system and, if necessary, change it - as we did with TestLink.
Decide on a test framework. His choice will largely depend on the technology on which your product is based, and the programming languages in which it is written.

After all these actions, you and yourself will understand exactly what direction to develop your personal cloud.

Source: https://habr.com/ru/post/143891/

All Articles