In the last article, we talked about the system we created under the name
ULA (Unified Logfile Analyzer) —analyzer, the main functions of which are collecting and aggregating incoming error messages using the shingles algorithm, making decisions on them and automatic notification for problems with the test environment. Today we will share the practice of detecting / resolving bugs rolling out this system and our plans.
At the moment we have connected to the system a little less than half of the planned projects. Why not everything? Maybe the employees received insufficient information, maybe they simply did not believe in the product - as it turned out, it was not enough just to tell managers about the product and arrange a few presentations. In the project environment, where the developers from among the creators of the ULA participated, the success was greater, and the implementation at the beginning was carried out without much effort.
Implemented ULA through meetings with management and testers. They told about the system on presentations, demonstrated the main functions. After several such sessions, ESF autotest developers gradually began to receive connection requests. Perhaps, it would be possible to start better if we announced the tool in advance so that users would wait for the release.
')
Standard questions that were asked to us:
- Does your system compete with HP ALM?
Perhaps in the future, in terms of collecting metrics for automated testing.
- Does your system know how to aggregate the logs of the ESF systems themselves?
Not at the moment, but in the future we will implement the log analysis of the systems themselves. While this data is collected and attached to the tests in the form of additional information.
- Why not the ELK stack (Elastic Search, Log Stash, Kibana)?
We need more complex processing logic, decision-making functionality, integration with HP SM, HP ALM, the ability to work with the source data purposefully on the right request, i.e. do not need a constant stream of data from the system logs.
- And who will use the system?
Here everything is ambiguous. The implication was that a team of engineers, which conducts mostly manual testing and analyzes autotests, should analyze the errors. But this is not always the case: in completely new projects, autotest developers or other engineers are often involved in parsing. Therefore, now it is important for us to clearly understand the situation for each project, to identify those who need to be trained to work with the system.
Now about the problems that we faced after connecting several ESF projects.
Quality AutoTest logs
The main problem that required a change in the basic algorithm is the presence of the same type of trace trace in the logs that write auto-tests. For a series of tests, TestNG is used, and in case of an error, the guys write a full trace to the log, which the framework generates. As a result, up to 80% of the length of the error message becomes similar. It was necessary to do something with it. And quite urgently. To cut off part of the log and not to process it at all would be completely wrong. Therefore, it was decided to introduce weight shingles, i.e. introduce weights canonized, cleared of "garbage" phrases. There is no such approach in the classical algorithm.
In the future, when enough statistical data is gathered, we derive the necessary polynomial for determining weights. So far, when viewing several hundred messages, it was decided to use the slightly corrected arc-tangent function. The main significance is taken by the first 20 to 30 words of the message, then a slight decline begins (the beginning of the stack trace). The tail trace has the least significance. It may be necessary in the future to introduce the dependence of the parameters of the algorithm on the subsystem under test and the framework used.
Performance
Although during the development of the system load testing was carried out at each sprint, it did not help us to avoid a number of performance problems when connecting real projects. We are faced with the fact that:
- the load is distributed very unevenly;
- Real error messages sometimes have such a large volume that their analysis on the DBMS server can occupy them for almost 1 second.
It happens that the queue gets up to 200 messages per second, and they begin to accumulate. As a result, everything, of course, is processed without critical situations, but a 100% busy processor affects the operation of WEB services. Here is what we have done so far to solve performance problems:
- added several CPUs to the database server to enable parallelization of error handling;
- did shingles processing through a separate Oracle AQ subscriber with their own procedure, which will deal only with error handling and nothing else. Inserting information about autotest steps will occur independently of this function, and data synchronization will occur at the end of the launch;
- worked through the changes in the message loading algorithm: switching from batch mode to online without delay;
However, the issue of performance is not fully resolved, the team continues to work.
Synchronization of flows in a DBMS
Oracle AQ queue parsing occurs through a procedure that is associated with a subscriber. The DBMS manages multithreading, but with a heavy load, we are faced with a problem.
The fact is that it is necessary to keep logs of incoming messages in the system (one message for us is a record of the test step). Counters are grouped by unique launch IDs. This is necessary in order to compare the number of incoming messages with the expected ones and to understand whether the launch is complete, build a test tree and display an aggregated error table. Without elements of synchronization of threads such a counter can not be entered. First, we invented the “bicycle” and made the MUTEX table, which was blocked for a fraction of a second during the calculation of the counter value. Under heavy load, they began to catch the dead block. Then they used the DBMS_LOCK package and created a lock on the piece of code that worked with the counter. For a long time, they could not understand why sometimes the counter showed an incorrect value, but in the end they decided to have a synchronization problem. For those interested, we recommend reading this
article about the pitfalls of locks.
Versatility
We position the system as universal: it is enough to write our own autotest report parser for it. But in fact, for the same Allure it turned out to be quite difficult to do. The fact is that the same can be recorded in the report in different ways, we do not have general rules. As a result, for two weeks constantly had to carry out improvements and, most likely, this is not the end. We even got into the code of Allure itself, but more on that later.
System limitations and design errors
- There is a restriction for the similarity algorithm: the system works only with errors consisting of more than 4 words. Although we have not yet met such short texts in the EFS, now we are choosing a new algorithm just in case.
- The aggregation of the logs of the ESF subsystems does not occur yet, because the logging rules in the ESF are not worked out until the end. Now we can not always extract useful information from text log files, for example, to which session the recording belongs. At first, the system will only enrich with this information the data about errors found by autotests.
- When designing the system, we focused on HP ALM and the dimensions of the fields: test ID, test name, step name. But besides the GUI and API tests of the EFS, we begin to work with autotests for mobile applications, information about which is transmitted to Jira, where the dimension of the fields is significantly greater than in ALM. In addition, the test ID field is text. But at the time of this writing, we have already prepared a patch that fixes the problem, affecting the productive data.
Allure
The first Allure problem we encountered is the difference in adapters for different frameworks. This is not the specifics of our autotests, but a common practice. The testClass and testMethod labels with which the test was defined belonged to the testNG feature adapter, and other adapters did not provide them by default. Adding 2 labels turned out to be easy, since the model (AllureModelUtils) had these methods:
public static Label createTestClassLabel(String testClass) { return createLabel(LabelName.TEST_CLASS, testClass); } public static Label createTestMethodLabel(String testMethod) { return createLabel(LabelName.TEST_METHOD, testMethod); }
It was decided not to rewrite the logic of the parser, but to create your own listener in which these two labels would be added.
The second problem we encountered is testNG. The adapter creates separate tests for the
before methods if an error occurred during their execution. The tests themselves go to the status of canceled. Thus, we received duplicate tests in our system.
The fix for this Allure feature was flagged in RoadMap Allure 2.0, but most projects still used version 1.5 or even lower. Our parser was primarily written for these versions. We could not wait, so again we went by correcting the listener.
Multi-browning
When designing, we chose React JS and focused on working in Google Chrome. They showed it to the management, started testing it on other browsers and it turned out that nothing really works. In the future, it will be necessary to devote more time to the problem of multi-browser compatibility. At the moment, the WEB part of the system works in Google Chrome, Mozilla Firefox, MS IE latest versions.
Shoemaker without shoes
We are so carried away by other logs that we forgot about our own. Of course, they were, but the detailing turned out to be insufficient. When the real operation began and problems started to fall down, we had to spend several days walking through all the functionality and making normal logging in the system itself. Logs are written for errors in the analysis of queues, in the called procedures and in the system services themselves. Logs every user action.
Rush
In order to accelerate the output of the system to productive operation, the usual bash was used to search for the necessary pieces of logs in the file system on the ESA test environment. They wrote a script that goes through the directories, unpacks the necessary files, searches for entries for the input session and writes intermediate results into a temporary file (rather large). The last action was a mistake. This solution was single-ended and unacceptable for us. At the moment we rewrote almost all the functionality in Java, and the intermediate results are stored entirely in memory.
Future plans
In the near future we plan:
- Deploy the system online to test mobile devices;
- to integrate with Jira for the establishment of defects in mobile applications there;
- implement automatic notification of open blocking defects so that the user can correct the list of tests at the next launch;
- develop an algorithm for pre-filling the fields, which will give a gain in time when new defects are introduced;
- develop an algorithm for automatically placing requests for servicing the test environment by administrators, which will streamline the work;
- refine the search algorithm for fuzzy matches, making it more flexible, so that the system is less mistaken in error aggregation;
- make autocategorization of errors. The system, based on previously made decisions and on the basis of manually entered rules, will immediately break the error messages into categories, which will allow the user to pay attention primarily to important problems:
- Enrich the network traffic requests and responses for API tests;
- create automatic scan functionality for the entire launch: scanning all system logs for errors during the session. The user will be able to click on the successful launch from the point of view of the autotest developer and carry out an automatic analysis for errors in the system itself.
Despite all the bugs, we are optimistic and believe that the development will help engineers to significantly reduce the time to analyze the results and improve the quality of analysis. At the moment, we have accumulated volume backlog, the implementation of which will give us a new interesting experience and make the product better. We will be happy to answer your questions on the topic, learn about your practice and cases.