Mutation Testing

Unit tests help us make sure that the code works the way we want it. One of the test metrics is the percentage of code line coverage (Line Code Coverage).

But how correct is this indicator? Does he have a practical meaning and can we trust him? After all, if we remove all the assert lines from the tests, or simply replace them with assertSame(1, 1) , then we will still have 100% Code Coverage, while the tests will absolutely not test anything.

How confident are you in your tests? Do they cover all branches of your functions? Do they even test anything?

The answer to this question is given by mutational testing.

Mutation testing is a software testing method based on all sorts of changes to the source code and checking the response to these changes in the suite of automated tests. If the tests after the code change are successfully executed, then either the code is not covered by the tests, or the written tests are ineffective. The criterion that determines the effectiveness of a set of automated tests is called Mutation Score Indicator (MSI).

Let's introduce some concepts from the theory of mutation testing:

To use this technology, we obviously must have a source code ( source code ), a certain set of tests (for simplicity, we will speak about unit tests ).

After that, you can begin to change individual parts of the source code and see how the tests react to it.

One change to the source code will be called Mutation . For example, changing a binary operator "+" to a binary "-" is a code mutation.

The result of the mutation is the Mutant - that is, it is a new, mutated source code.

Each mutation of any operator in your code (and there are hundreds of them) leads to a new mutant, for which tests must be run.

In addition to changing the "+" to "-" , there are many other mutational operators ( Mutation Operator , Mutator ) - negation of conditions, changing the return value of a function, deleting lines of code, etc.

So, mutational testing creates a multitude of mutants from your code, for each of them runs tests and checks whether they completed successfully or not. If the tests fell, then everything is fine, they reacted to the change in the code and caught the error. Such a mutant is considered killed ( Killed mutant ). If the tests are successful after mutation, it means that either your code is not covered in this place with tests at all, or tests covering the mutated string are ineffective and insufficiently test this section of code. Such a mutant is called a survivor ( Survived, Escaped Mutant ).

It is important to understand that mutational testing is not a chaotic code transformation, but an absolutely predictable and understandable process that, with the same input mutational operators, always produces the same list of mutations and the resulting metrics on the same source code being tested.

Consider an example. We will use the mutational framework (MF) for PHP - Infection .

Suppose we have some kind of filter that can filter a collection of users by the age of majority, written in object-oriented style:

 class UserFilterAge { const AGE_THRESHOLD = 18; public function __invoke(array $collection) { return array_filter( $collection, function (array $item) { return $item['age'] >= self::AGE_THRESHOLD; } ); } }

And for this filter there is a unit test:

 public function test_it_filters_adults() { $filter = new UserFilterAge(); $users = [ ['age' => 20], ['age' => 15], ]; $this->assertCount(1, $filter($users)); }

The test is very simple - we add two users and we expect that the filter will return only one of them, which is 20 years old.

Note that if you only have this test, we already have a 100% coverage of the source code of the class UserFilterAge . Run a mutation test and analyze the result:

 ./infection.phar --threads=4

With 100% code coverage, we only have 67% MSI - this is already suspicious.

How is MSI considered

 Metrics: Mutation Score Indicator (MSI): 47% Mutation Code Coverage: 67% Covered Code MSI: 70%

Mutation Score Indicator (MSI)

MSI is 47%. This means that 47% of all generated mutations did not survive (killed, timeouts, errors). MSI is the primary metric for mutation testing. If Code Coverage is 65%, then we get a difference of 18% and this indicates that the percentage of coverage of lines of code in this case is a bad criterion for evaluating tests.

Counting formula:

 TotalDefeatedMutants = KilledCount + TimedOutCount + ErrorCount; MSI = (TotalDefeatedMutants / TotalMutantsCount) * 100;

Mutation Code Coverage

This indicator is 67%. In general, it should be approximately equal to the Code Coverage indicator.

Counting formula:

 TotalCoveredByTestsMutants = TotalMutantsCount - NotCoveredByTestsCount; CoveredRate = (TotalCoveredByTestsMutants / TotalMutantsCount) * 100;

Covered Code Mutation Score Indicator

The MSI for the code that is covered in tests is 70%. This criterion shows how effective your tests are in reality. That is, this is the percentage of all killed mutants generated for the covered test code.

Counting formula:

 TotalCoveredByTestsMutants = TotalMutantsCount - NotCoveredByTestsCount; TotalDefeatedMutants = KilledCount + TimedOutCount + ErrorCount; CoveredCodeMSI = (TotalDefeatedMutants / TotalCoveredByTestsMutants) * 100;

If we analyze the metrics, it turns out that MSI is 18 units less than the Code Coverage indicator. This suggests that the tests are much less effective according to the results of mutational testing than the results of the bare Code Coverage.

Let's look at the generated mutations.

First mutation:

 class UserFilterAge { const AGE_THRESHOLD = 18; public function __invoke(array $collection) { return array_filter( $collection, function (array $item) { - return $item['age'] >= self::AGE_THRESHOLD; + return $item['age'] > self::AGE_THRESHOLD; } ); } }

The tests run for it are executed successfully. That is, the change in the source code had absolutely no effect on the test results. This is not what we need.

Mutation testing told us that we can take and replace the condition with ">=" with ">" , and the program will work just as well. Remember, unit tests guarantee us that the program works the way we want it? And once the tests have been completed successfully with such a mutated code, then we expect this behavior.

From this mutation it can be seen that when testing a code with conditions for intervals, one should always check the boundary values.

Let's fix the situation and kill the mutant:

 /** * @dataProvider usersProvider */ public function test_it_filters_adults(array $users, int $expectedCount) { $filter = new UserFilterAge(); $this->assertCount($expectedCount, $filter($users)); } public function usersProvider() { return [ [ [ ['age' => 15], ['age' => 20], ], 1 ], [ [ ['age' => 18], ], 1 ] ]; }

We added one test for the boundary value - 18. Now, if we run the tests with the mutated code again, they will fall, since all values will be filtered out and the empty collection will return, which is naturally not true.

Second mutation:

 class UserFilterAge { const AGE_THRESHOLD = 18; public function __invoke(array $collection) { - return array_filter( + array_filter( $collection, function (array $item) { return $item['age'] >= self::AGE_THRESHOLD; } ); + return null; } }

It is not immediately obvious what happened. This is a rather interesting mutational operator, replacing the function call in the expression "return functionCall();" on "functionCall(); return null;" .

But why did such a mutation happen at all? Is it true to return null when we expect a filtered array ? Of course, not true, and this happens because we did not specify the type of the return value in the function. The mf sees that the return value may be null , and tries to slip it. Infection is quite clever in this regard, and if the function contains a specific type (not nullable , for example int ) of the return value, then the code will not mutate. Analyzing this mutant, we conclude that typehint should be added:

 - public function __invoke(array $collection) + public function __invoke(array $collection): array

Now the method signature is absolutely clear - we pass an array to the filter, we expect an array.

Run again and check the result:

The number of mutations is expected to decrease due to the addition of the return type, and all mutants are killed. Now we have not only Code Coverage 100%, but also Mutation Code Coverage 100%, which is a much more indicative criterion for the quality of your tests.

This simple example shows that even with 100% coverage of a code with tests, mutational testing can still reveal problems and, as it were, cover your code "by more than 100%."

If you have not yet penetrated, consider the mutational operators more powerful - PublicVisibility and ProtectedVisibility . Their meaning is to change the access modifier from public to protected for each method of the class (except for some magic and abstract), from protected to private .

This allows you to check the need for openness methods. If such mutants prove to be survivors, then it can be concluded that the public interface of your class may be reduced and, most likely, is redundant. And in the case of the ProtectedVisibility operator, the surviving mutant says that the method should be changed to private and there is not a single heir to the class that would use / override the parent protected method.

For example, by running Infection for a FosUserBundle known FosUserBundle , you can see that there is a public method isLegacy , the openness of which can be reduced.

 ./infection.php --threads=4 --show-mutations --mutators=PublicVisibility,ProtectedVisibility

In addition to these two cases with a surviving and killed mutant, there are others. For example, a change in the cycle of the unary operator "++" on the counter variable to "--" can lead to the fact that the cycle will never end, since will be endless. The task of the mutation testing framework is to correctly handle such situations and mark the mutant with a special status - Timeout . This outcome is positive and the mutant is not considered surviving.

In general, we figured out the theory, now let's see what Infection is in more detail, and what alternatives are there for PHP.

Infection PHP

To work, Infection requires the xDebug extension installed for Code Coverage and PHP 7.0+.

The recommended installation method, with the possibility of automatic updating ( infection.phar self-update ), is the Phar archive.

Currently, two testing frameworks are supported out of the box - PHPUnit (5, 6+) and PhpSpec.

When you first start from the root of your project, an config.infon.json.dist will be created, which you can later commit to VCS. It lists the source folder for mutations, exceptions, timeout value, etc.

Mutation testing as a whole requires human analysis; therefore, after mutation is completed, all generated mutations are logged in the same folder as infection-log.txt .

Options

Of the most interesting options with which Infection is launched, the following can be highlighted:

`--threads`

This is the number of threads working in parallel to trigger the entire set of generated mutants. Significantly speeds up execution time. But there is a reservation: if your tests somehow depend on each other or use a database, using this option can lead to numerous dropped tests, which will have an extremely negative impact on the resulting metrics. Therefore, at least look at the log at the initial stages of implementation is still worth it.

`--show-mutations`

Immediately displays diff with not killed mutants on the console, which allows you to instantly analyze the result and correct the test as it is written.

`--mutators`

Enumeration of mutational operators that mutate the code. Conveniently, for example, if you want to check only the PublicVisibility and ProtectedVisibility statements.

 ./infection.phar --mutators=PublicVisibility,ProtectedVisibility

`--min-msi` and `--min-covered-msi`

These two options are useful if you run Infection as one of the steps to build your project on the Continious Integration server.

--min-msi allows you to specify the minimum value (in percent) of Mutation Score Indicator. If the specified value is less than the actual, then the build will fall. This option causes each build to cover more lines of code.

--min-covered-msi accordingly allows you to specify the minimum value of Covered Code MSI. This option with every build makes writing more efficient and reliable tests.

Both options can be used both individually and together.

 ./infection.phar --min-msi=80 --min-covered-msi=95

Use with Travis CI

 before_script: - wget https://github.com/infection/infection/releases/download/0.5.0/infection.phar - wget https://github.com/infection/infection/releases/download/0.5.0/infection.phar.pubkey - chmod +x infection.phar script: - ./infection.phar --min-covered-msi=90 --threads=4

Each release (Phar archive) is signed with a private openssl key, so besides the archive itself, you also need to download the public key.

How to use mutation testing?

How can mutation testing be useful for you, as a developer in your work or personal projects? How to implement it in an existing project?

Daily use for developer

Mutation testing can be useful in daily work when writing new tests. The scheme of work looks like this:

You wrote a new functionality, for example, the same UserFilterAge from the example above.
this code is already covered by tests
to test tests, you run mutation testing only for this file

 ./infection.phar --threads=4 --filter=UserFilterAge.php --show-mutations

Analyze the surviving mutants and try to achieve a good indicator of Covered Code MSI - i.e. so that the percentage of killed mutants from all generated for the code covered with tests should be aimed at 100. This will allow writing tests as efficiently as possible.

When using MT, you will notice that you write a more concise code with most of the tests. This will use branch coverage when all the paths of your code have been tested, instead of the usual line coverage.

Daily use in the project

Mutation testing can be used on a Continious Integration server. Depending on the size of the project, it can be run either on each build, or less, as an option once a day at night. The main thing here is to analyze the result and constantly improve the quality of tests.

In my opinion, by generating only a report, you cannot achieve good performance, so it is better to use the --min-msi and / or --min-covered-msi options.

For example, the mutation framework Infection mutationally tests itself on every build. And if the numbers fall, the build also falls.

With continuous use of MT, MSI indicators in the project will grow and you will be able to gradually increase the values of the --min-msi and --min-covered-msi options.

Why is it sometimes impossible to achieve 100% MSI?

In mutational testing there is the concept of identical mutants. That is, these are mutations that lead to identical code in terms of logic. An example of such a mutation is the following code:

 public function calculateExpectedValueAt(DateTimeInterface $date) { $diffInDays = (int) $this->startedAt->diff($date)->format('%a'); $multiplier = $this->initialValue < $this->targetValue ? 1 : -1; $initialAveragePerDay = $this->calculateInitialAveragePerDay(); - return $this->initialValue + ($initialAveragePerDay * $diffInDays * $multiplier); + return $this->initialValue + ($initialAveragePerDay * $diffInDays / $multiplier); }

The point is that multiplying a number and dividing a number by ±1 leads to an identical result, and such a mutant turns out to be a survivor.

In this regard, it is not worthwhile to expect in practice for the entire code of one hundred percent MSI. This requires a powerful system for registering identical mutants and the possibility of excluding them from the resultant metrics.

PHP Alternatives

The only full working alternative for Infection in PHP is Humbug - this is generally the first MF in PHP. Of the benefits, it has experimental support for mutation caching (incremental cache). That is, if a file does not change and no tests covering its lines were deleted during the next run, the mutation does not start and the result of the last run is taken. Theoretically, this can significantly increase the speed of work, but can lead to false positives and errors in the metrics.

On the other hand, Humbug does not yet support PHPUnit 6+ and PhpSpec. However, the main difference between Infection and Humbug at the moment is that Infection uses the Abstract Syntax Tree (AST) syntax tree. Building AST is possible thanks to the wonderful project of Nikita Popov - PHP-Parser .

What does the use of AST? Consider more.

To begin to mutate the code, you must

split the code of the file into tokens (function token_get_all () ), put them into an array
run through the array and replace each token, if necessary, with another one, according to the mutation operator
from the new set of tokens to assemble a new mutated source code

Sample Tokens

 T_OPEN_TAG ('<?php ') T_BOOLEAN_AND ('&&') T_INC ('++') T_WHITESPACE (' ') ...

But in fact, the process is much more complicated, because the decision to change the token depends on several conditions.

Are we in a function body? Replacing T_OPEN_TAG ('<?php ') makes no sense
Will the code be valid after mutation? (for example, adding arrays ['a'] + ['b'] is a valid code. But subtracting arrays ['a'] - ['b'] is already a Fatal Error . Consequently, such a mutation is not necessary, and The MF must check if the addition token is between the arrays.

As a result, using an array of tokens, it is rather difficult to answer these questions in terms of code. On the contrary, having an abstract syntax tree, it is easy to do this, using objects representing the source code ( Node\Expr\BinaryOp\Plus , Node\Expr\BinaryOp\Minus , Node\Expr\Array_ ).

Here are the implementations of a mutation operator that changes "+" to "-" with checking arrays:

Infection

 class Plus implements Mutator { public function mutate(Node $node) { return new BinaryOp\Minus($node->left, $node->right, $node->getAttributes()); } public function shouldMutate(Node $node) : bool { if (!($node instanceof BinaryOp\Plus)) { return false; } if ($node->left instanceof Array_ && $node->right instanceof Array_) { return false; } return true; } }

Humbug

 class Addition extends MutatorAbstract { public static function getMutation(array &$tokens, $index) { $tokens[$index] = '-'; } public static function mutates(array &$tokens, $index) { $t = $tokens[$index]; if (!is_array($t) && $t == '+') { $tokenCount = count($tokens); for ($i = $index + 1; $i < $tokenCount; $i++) { // check for short array syntax if (!is_array($tokens[$i]) && $tokens[$i][0] == '[') { return false; } // check for long array syntax if (is_array($tokens[$i]) && $tokens[$i][0] == T_ARRAY && $tokens[$i][1] == 'array') { return false; } // if we're at the end of the array // and we didn't see any array, we // can probably mutate this addition if (!is_array($tokens[$i]) && $tokens[$i] == ';') { return true; } } return true; } return false; } }

Obviously, using AST offers tremendous benefits. It is easier to work with, easier to maintain and understand the code, easier to create new mutational operators and easier to analyze the code, walking along the branches of the tree.

In general, mutational testing is another means to improve the quality of your tests and the code as a whole, worth paying attention to.

If you have experience using MT on real projects, or you will try Infection and find interesting errors in the code - share in the comments about any useful cases.

Used Books:

E-book on mutation testing (in English)
Mutation Testing Repository

Source: https://habr.com/ru/post/334394/

All Articles

Mutation Testing

Mutation Score Indicator (MSI)

Mutation Code Coverage

Covered Code Mutation Score Indicator

Infection PHP

Options

--threads

--show-mutations

--mutators

--min-msi and --min-covered-msi

Use with Travis CI

How to use mutation testing?

Daily use for developer

Daily use in the project

Why is it sometimes impossible to achieve 100% MSI?

PHP Alternatives

More articles:

`--threads`

`--show-mutations`

`--mutators`

`--min-msi` and `--min-covered-msi`