📜 ⬆️ ⬇️

Test automation: Acronis Kernel “drone”


( http://bp-la.ru/bespilotnyj-apparat-danem )


Build => Test => Not passed => and kilometers of logs scattered across different systems, and tens of minutes mixing ends together in search of the cause of the failure. Familiar?


And if not?


Build => Test => Failed => Ticket in JIRA - and the developer takes the bug to work, because he already has all the information.


Working in the Acronis Kernel team, I set out to create just such an autotest.
Under the cut - my story.


Introduction


Software testing is a study to provide interested parties (Stakeholders, hereinafter Customers) with information about the quality of a product or service ( from Wikipedia ).
Customers perceive test results in different ways:



Data should be available as soon as possible, ideally in real time, immediately after the appearance of a new product assembly.


Now we will present a workflow sufficient to fulfill the specified requirements:


  1. Tests will start as soon as a new product is available;
  2. The test execution time is determined by the development process adopted by the company, but must not exceed the average time between appearances of a new assembly;
  3. Detected errors are automatically analyzed, already known errors are attributed to existing defects, the rest are recorded as new defects - within a few minutes after detection;
  4. Test results are marked on the "Product Quality Card".

Here, in the Acronis Kernel team, we built such a process - not immediately, of course.
First I will tell you where we started.


Prehistoric



( http://spongebob.wikia.com/wiki/Primitive_Sponge )


Machinery



It all worked like this


  1. There is a new build.
  2. CC noted the appearance of the assembly, and, according to the test plan stored in it, created new tasks in Testlink.
  3. ATMS found tasks in TestLink and requested resources for them from the hypervisor. There were no queues of tasks: who managed to seize the resource, he is right.
  4. Having obtained the required VM set, ATMS configured the Guest OS in them. The configuration recipe was defined as a custom DSL.
  5. Then the control was transferred to the Python library, which completed the configuration of the environment, deployed the build and ran the tests.
  6. Upon completion of the tests, the ATMS collected logs and test results, updated the task status in TestLink.
  7. CC saw the completion of the task in TestLink, retrieved the results, updated its statistics database and sent a report on the test results by letter. Later, the Control Center took over the functions of TestLink, and the tasks were created in its internal database, which emulates Testlink for the client - ATMS.

Tests went on for several hours, often giving a random (non-replicable) result. For the analysis of the files, there was a whole quest with a visit to ATMS, CC, balls with logs, a detailed analysis of the logs and a search for similar bugs in Jira - all hand to hand.


Registered failures are sometimes reproduced, more often - not. For most of the bug fixes, the developers asked to clarify the steps, provide a virtual machine or attach forgotten logs.


About once a week, the ATMS fell. If the test hung, or for some other reason the resources did not become free, you had to manually delete the virtual machines, remove the task in the ATMS and reset the host busy count.


It was possible to compare the results of tests on different assemblies by static email reports with a graph of the Result Type / Build Number, or by selecting the results manually in the CC. To compare the results of the same test on different operating systems, I had to manually view the test logs from each OS.


As a result, developers did not trust autotests, relying more on manual launching of their own tests on their environment. This "mechanization" did not suit me at all, the situation had to be corrected.


Brave new world



( http://dkrack.wikispaces.com/Brave+New+World )


The architecture of the new autotest system was based on:



0 (zero) bicycles.


Task path


  1. With a successful build, the build server (also Jenkins) starts the project on the test Jenkins, putting the test in the queue.
  2. Test Jenkins reserves resources (VM linked clone), downloads the latest test code from SVN, runs the CMD script to set up the environment, and calls pytest.
  3. Pytest using the built-in test discovery function selects cases and starts the test. The framework code is executed on the Gate VM, the control machine, and the System Under Test (in our case, the kernel driver) is deployed on the Test VM, so as not to lose results in the BSOD case.


    • The standard python logging library writes the info log and the debug log into two different files:
      a) Info log contains test steps and meets two requirements: 1) human readable format, 2) there is enough information to reproduce the failure.
      b) Debug log includes timestamp, address \ line number of the executable code and the expanded message. The log allows you to track a detailed history of events that are not directly related to the essence of the test, but affecting the result: whether it was possible to establish a connection, how much time the reboot took, etc.


    • The test stops when the first failure is detected (the result is assert = False). Pytest writes the result + trace to junit xml.

  4. Jenkins (JUnit Plugin) publishes a report and starts the python script for reporting bugs.
  5. The script searches for already known open bugs in Jira, if it finds it - leaves the comment "Reproduced there somewhere", if not - it registers a new bug. The error message (pytest assert) goes into the header, the steps from the Info log into the description, the test logs and the drivers themselves will attach to the bug.

I will give the scheme for clarity:



(© Acronis)


The name of the bug is added with a suffix to the name of the VM, so developers can easily find a car if necessary. The machine on which the already known bug was reproduced will be automatically removed after three days. The machine with the new bug will be automatically removed after the developer translates it into Resolved status, and the corresponding test passes without errors.


An example of an automatically activated bug



(© Acronis)


Previously, the automator had to spend 80-90% of the time on manual analysis of test results. Now just look at the list of bugs in Jira. The product bug goes to the developers, the automator takes the test fails. If there is not enough information in the bug report, you don’t need to teach people to get bugs differently - just change the code.


An example of developer communication with an automatic bug reporter



(© Acronis)


Support for tests has been reduced to processing in the code of yet unaccounted types of failures. Corner cases will always be there, this should be understood, and you should not aim at getting rid of 100% of the failures of the auto-test / test infrastructure. It is enough to turn these failures into specific action items - bugs in Jira, in our case, and fix them one by one.


Product Quality Card


A general overview of the state of the tested components can now be obtained by looking at the Jenkins dashboard:



(© Acronis)


Dashboard implemented using the plugin https://wiki.jenkins-ci.org/display/JENKINS/Dashboard+View .


Maybe not all readers are familiar with Jenkins, so I’ll explain the values ​​of the columns:



results


We built and debugged the system I described above by the end of last fall, and then actively added new scripts for testing. From February 2016, I switched full time to another project.


During my absence (six months):



The project has lived for six months and has been developed by the efforts of developers only, without a single tester. The developers have independently added a new component, creating Jenkins projects and Pyhton code by analogy with existing ones.


Incorrect bugs during this time, too, quite a lot, mostly duplicates, born of an incorrect new test setup or test server failures. However, this is a topic for a separate article.


')

Source: https://habr.com/ru/post/282682/


All Articles