About testing metrics: code coverage for testers

As it is known from the book “Hitchhiker's Guide to the Galaxy”, the answer to the main question of life, the universe and everything like that is 42. The percentage of code coverage along lines on one of my projects is 81, does this figure give an answer to the main question of testing “how many tests enough to determine the quality of the product?

During my work in the IT sphere and testing, I saw few teams and projects where testers actually use code coverage in their work. This is connected in my opinion with two things:

1. By testing first of all the requirements;
2. Not everyone understands how to count and use the coating.
')
For those interested, I offer my opinion on these 2 points.

Requirements vs code

The tester is testing the requirements. Even if they are not formally, there is an idea of how the system should behave. This and only this is important in the end.
But
There are no clear comprehensive full requirements, having checked each of them, it is safe to say that the system will work as it should and there are no bugs.

Example 1

The application is trying to save data to the database (located on another server). There is a description of how it should do this, including the requirement that if it is impossible to perform an operation (no access to the database, for example), we should try to do it before a certain timeout expires, then issue an error to the client.

What means impossible to perform the operation?

Suppose the tester checks the script with the loss of the connection to the database in the process. Everything works well, but does it mean that there are no bugs?
In the mentioned application, we looked at the code coverage of the corresponding classes - it turned out that the developer provided for handling about 5 exceptional situations in the code.

This meant, at a minimum, the following cases:
1. Connection to the database server cannot be established;
2. The connection to the database server is established, the execution of the request caused an error in the Oracle;
3. The connection to the database server was established, the request was started and it hung up - there was a bug. The application waited for an answer of about 5 minutes, then flew into the logs of the expection and it did not try to write more this data.

A couple of the rest were not worth attention for various reasons.

In the example of the requirement, it was formally checked by the 1st case, but the bug was found after analyzing the code coverage. It can be argued that this is not an example of the benefits of code coverage, but of the benefits of interaction within the team (the developer could learn the implementation details in advance or give him cases on the review), ~~in fact, I always do this,~~ but not everyone will ask, Often, uncovered blocks of code attract attention to things.

Example 2

In another system that I tested, when data consistency was lost, the application had to throw out the corresponding action, throw the monitoring to the notification and wait for people to come and save it. Tests covered various cases of such situations, everything was handled normally.
We looked at the code, the required piece was covered well, but I saw in another class an uncovered area of code in which the same event about loss of consistency was thrown. Under what conditions it is unknown, since the developers quickly cut it out. It turned out he was copied from the old project, but no one remembered that. Where it could fire is unknown, but we would not have found it without analyzing the code.

Therefore, let the tester test the requirements, but if he also looks at the code, he can catch what is not described in the requirements and clever methods of test design will not always be found either.

Coverage = 80. And the quality?

Quantity does not mean quality. Evaluation of code coverage is not directly related to the quality of the product, but is indirectly related.
At one reporting meeting, I stated that the code coverage increased to 82% along the lines and 51% under the conditions, after which the management asked me the question: “What does this mean? Is that good or bad? ”The logical question is, really: how much does it take to be good?

Some developers cover their code, achieving 100%. It’s pointless for a tester to achieve 100%, starting from some moments you will encounter the fact that you cannot physically touch this code with integration tests.
For example, the developers consider it a good form to check the input parameters of a method to null, although in a really working system such cases may not be (50% of the conditions we then got included, also because of this). This is normal; it was only possible to transfer null from the outside until the first check, which will handle this situation itself.

To the question of "this is normal": a qualitative assessment of the uncovered code and leads in my understanding to the adequate use of code coverege. Watch it is important that you are not covered, and not how much . If this is java-code and methods toString (), equals () or branches with an exception that are difficult to reproduce integrally, well and good, let it be 80% of the coverage of real business logic. "Excess" code, many tools are able to filter and not count.
If doubts in the white spots still remain, it is possible to calculate the total coverage of integration tests and the unit - the developers probably took into account a lot of things that are difficult to access for integration tests.

However, there is one "but." What if code coverage is low? 20%, 30%? Somewhere I read a funny fact that the coverage of 50% and less (in lines and conditions, as I recall) means that the level of test coverage, in which the result of the application will be the same as in the absence of testing in general. Those. there may be bugs, there may be no bugs, you might as well not have tested it. Another explanation is a lot of dead code, which is unlikely.

And we do not have autotests

And they are not needed. Even if you are assured of the opposite, some developers are not aware that coverage can be considered not only for unit tests. There are tools that write coverage in runtime, i.e. put a ~~specially trained~~ instrumented build, pass tests on it, and he writes coverage.

What's the point?

My friend, an excellent test lead, asked the question: “when test cases are not everything, and automation is in its infancy, does it make sense to spend resources on evaluating code coverage?” Implementing new pieces in the process always causes management pain; time, resources and other frailties of existence, no room for the flight of the dreamer tester.

Let's look at the order of where it will be necessary to spend resources, if you decide to try to read code coverage:

Choosing a tool that fits your application.
Instrumentation of builds (including configuration of the code coverage and filtering “unnecessary” for code evaluation)
Building a coverage report after the test run
Coverage analysis

Points 1 and 2 can be given to the developers, some of them are familiar, heard, met with well-known Tula and even more able to build their own build. Building reports is usually done by a single command on the command line or automatically if you use CI (jenkins did this for me, he also published the report).
The most expensive is the fourth item. The main difficulty here is that for an adequate assessment, you must be able to read the code, or sit down next to the developer, so that he explains what this piece means and how to reproduce it. This requires a certain qualification from the test engineer and the working time of 1 or 2 people.

Is it worth it - to decide the team and its leaders. In projects where requirements are poorly formalized, or bugs arise in an inexplicable way for testers, perhaps this may help at least understand the direction where to dig.
Another category is projects that involve very hight-level black box testing. This is primarily testing through the UI or the external API of systems, within which there is a heap of logic that works according to its laws, i.e. from the outside, you cannot touch it or control it, which means you cannot test it properly. Coverage analysis in such projects will create a reasoned need to move to lower levels of testing: modular, component-wise, testing on plugs, etc.
The accumulated code coverage in numbers works well: on the charts you can see the moments when a new code is being poured in, and the tests have not yet arrived; if the level of coverage was high, then it began to decline, but it did not reach the previous level - somewhere there could be a good white spot that failed to test the requirements, etc.

Perhaps this is all I wanted to say for today.

Finally, limitations and out of scope

I tried to describe in general terms the approach to this issue, without going into many technical details. Speaking about the “coverage” of 80%, I am talking about some kind of general or average coverage, not referring to specific metrics - covering lines, conditions, and so on. The choice of specific metrics is a separate interesting question.
My experience is mainly related to java-code and tools for it, I have not worked in this way with other technologies, I know that there are tools for C ++, but so far I haven’t managed to try them out.
A serious analysis of the coverage should be carried out on stable builds and stable working tests, otherwise it will be difficult to say what caused the skips - dropped tests, critical bugs, or really something missing

Source: https://habr.com/ru/post/231583/

All Articles