Mutational analysis, or how to test tests

Tests do not happen much - everyone knows that. Memes about unit and integration testing are no longer very fun. And we still do not know whether it is possible to rely on the results of passing tests, and what percentage of coverage will allow not to let bugs into production. If the fatal changes in the code skip tests without affecting their result, the solution suggests itself - you need to test the tests!

On the approach to the automation of this task was the report of Mark Langovoy on Frontend Conf . Video and article are short, and ideas are very working - you need to take note.

About the speaker: Mark Langovoy ( marklangovoi ) works in Yandex in the Yandex.Tolok project. This is a crowdsourcing platform for quickly marking a large amount of data. Customers upload data that, for example, needs to be prepared for use in machine learning algorithms, and set a price, and the other side, the performers, can complete tasks and earn money.
')
In his free time, Mark develops the Krasnodar development community Krasnodar Dev Days - one of the 19 IT communities whose activists we invited to Frontend Conf in Moscow.

Testing

There are different types of automated testing.

In the course of popular unit testing, we write tests for small parts (modules) of an application. They are easy to write, but sometimes during integration with other modules they may not behave exactly as we expected.

To avoid this, we can write integration tests that will check the work of our modules together.

They are a bit more complicated, so today we will focus on unit testing.

Unit testing

Any project that wants at least some minimal stability deals with writing unit tests.

Consider an example.

class Signal { on(callback) { ... } off(callback) { const callbackIndex = this.listeners.indexOf(callback); if (callbackIndex === -1) { return; } this.listeners = [ ...this.listeners.slice(0, callbackIndex - 1), ...this.listeners.slice(callbackIndex) ]; } trigger() { ... } }

There is a Signal class - this is the Event Emitter, which has an on method for the subscription and an off method for deleting the subscription - check if the callback is contained in the array of subscribers, then we delete it. And, of course, there is a trigger method that will call signed callbacks.

We have a simple test for this example that calls the on and off methods and then the trigger to verify that the callback was not called after the unsubscribe.

 test('off method should remove listener', () => { const signal = new Signal(); let wasCalled = false; const callback = () => { wasCalled = true; }; signal.on(callback); signal.off(callback); signal.trigger(); expect(wasCalled).toBeFalsy(); });

Quality assessment criteria

What are the criteria for assessing the quality of such a test?

Code coverage is the most popular and well-known criterion that shows how many percent of the lines of code were executed when the test was launched.

You can have 70%, 80% or all of 90% of Code coverage, but does this mean that when you build a new build for production, everything will be fine, or something may go wrong?

Let's return to our example.

Friday night, you're tired, finish another feature. And then you come across this code, which was written by your colleague. Something in him seemed complicated and scary to you.

  ...this.listeners.slice(0, callbackIndex - 1), ...this.listeners.slice(callbackIndex)

You decided that you can probably just clear the array:

 class Signal { ... off(callback) { const callbackIndex = this.listeners.indexOf(callback); if (callbackIndex === -1) { return; } this.listeners = []; } ... }

I made a commit, put together a project and sent it in production. Tests have passed - why not? And he went to rest in a bar.

But suddenly, late at night, the bell rings, they scream into the phone that everything is falling, people cannot use the product, and in general - business is losing money! You burn, you face dismissal.

How to deal with this? What to do with the tests? How to catch such primitive stupid mistakes? Who will test the tests?

Of course, you can hire an army of QA-engineers - let our application sit and just click.

Or hire a QA automator. They can get the job of writing tests - why write by yourself, if there are special people for this?

But in fact it is expensive, so today we will talk about mutational analysis or mutational testing.

Mutation Testing

This is a way to automate the process of testing our tests. Its goal is to identify ineffective and incomplete tests, that is, in essence, this is testing of tests .

The idea is to change pieces of code, run tests on them, and if the tests did not fall, then they are incomplete.

Changes are made using certain operations - mutators . They replace, for example, plus by minus, multiply by divide, and other similar operations. Mutators can change pieces of code, replace conditions in a while, reset arrays instead of adding an element to an array.

As a result of the application of mutations to the source code, it mutates and becomes a mutant .

Mutants are divided into two categories:

The dead - those in which we were able to identify deviations, that is, in which at least one test fell.
The survivors are the ones who ran away from us and got the bug before production.

For quality assessment, there is the MSI (Mutation Score Indicator) metric - the percentage ratio between killed and surviving mutants. The greater the difference between code coverage tests and MSI, the worse the relevance of our tests reflects the percentage of code coverage.

It was a bit of theory, and now consider how it can be used in JavaScript.

Javascript solution

In JavaScript, there is only one actively developing tool for mutation testing - this is Stryker . This tool was named after the character X-man William Stryker - the creator of "Weapons X" and a fighter with all the mutants.

Stryker is not a test runner, like Karma or Jest; neither is it a framework for tests like Mocha or Jasmine. This is a framework for mutational testing that complements your current infrastructure.

Plugin system

Stryker is very flexible, fully built on the plugin system, most of which are written by the developers of Stryker.

There are plugins for running tests on Jest, Karma and Mocha. There is integration with the Mocha frameworks (stryker-mocha-framework) Jasmine (stryker-jasmine) and ready-made sets of mutators for JavaScript, TypeScript and even for Vue:

stryker-javascript-mutator;
stryker-typescript;
stryker-vue-mutator.

Mutators for React are included in the stryker-javascript-mutator. In addition, you can always write your mutators.

If you need to convert the code before launch, you can use plugins for Webpack, Babel or TypeScript.

This is all relatively simple.

Configuration

Configuration will not be difficult: you only need to specify in the JSON-config which test runner (and / or test framework, and / or transpiler) you use, and also install the appropriate plug-ins from npm.

A simple console utility stryker-cli can do all this for you in question-answer mode. She will ask you what you are using and will configure it yourself.

How it works

The life cycle is simple and consists of the following steps:

Reading and analyzing the config. Stryker loads the config and analyzes it for various plugins, settings, exclusion of files, etc.
Loading plugins according to config.
Running tests on the source code in order to check whether the tests are relevant now (all of a sudden they are already broken).
If everything is good, a set of mutants is generated for the files that we have allowed to mutate.
Run tests on mutants.

The above is an example of running Stryker:

Stryker runs;
reads the config;
loads the necessary dependencies;
finds files that will mutate;
runs tests on the source code;
creates 152 mutants;
runs tests in 8 threads (in this case, based on the number of CPU cores).

This is not a fast process, so it is better to do it on any CI / CD servers.

After passing all the tests, Stryker gives a brief report on the files with the number of created, killed and surviving mutants, as well as the percentage of the ratio of killed mutants to survivors (MSI) and mutators that were applied.

These are potential problems that are not foreseen in our tests.

Summarize

Mutation testing is useful and interesting . It can find problems in the early stages of testing, and without the participation of people. It will reduce the time it takes to test the pull request, for example, because qualified developers will not have to spend time checking the pull request, which already has potential problems. Or save production if you decide to prepare a new release on Friday night.

Stryker is a flexible multithreaded mutation testing tool. It is actively developing, but still damp, still has not reached the major version. For example, during the preparation of this report, its developers finally made it possible for Babel in the plugin to specify the configuration file and fix the Jest integration. This is an open source project that can be helped to grow.

Questions and Answers

- How to test mutation tests? Surely, there is also an error. In the first sample with unit testing, it covered 90%. It would seem that all is well, but cases still slipped when everything fell and was on fire. Accordingly, why should there be a feeling that everything is fine, after covering these tests with mutation tests?

“I’m not saying that mutation testing is a silver bullet and will cure everything.” Naturally, there may be some border insane cases or the absence of some kind of mutator. First of all, typical errors are easily caught. For example, you put a check on age, set it to <18 (it was necessary <=), and in the test I forgot to make a border case check. You made another comparison with the mutator, and as a result the test fell (or did not fall), and you understand that everything is good or bad. Such things are quickly caught. This is a way to simply finish the tests correctly, to find the lost moments.

- Often you have a situation "zadepil and left"? I think this is wrong.

- No, but I think that in many projects such things still exist. Naturally, this is not true. Many people believe that Code coverage helps to check everything, you can safely leave and not worry - but this is not so.

- I'll tell you right away what the problem is. We have a lot of any reducers and other things that we mutationally test, and there are a lot of them. All this is growing, and it turns out that for each pull request, mutation testing is started, which takes a lot of time. Is it possible to run only on what has changed?

- I think you can customize it yourself. For example, on the developer’s side, when he pushes, commit, you can make a lint-staged plugin that will only run files that have changed. On CI / CD this is also possible. In our case, the project is very large and old, and we practice spot checking. We do not check everything, because it will take a week, there will be hundreds of thousands of mutations. I would recommend doing spot checks, or organizing a selective launch process myself. I did not see a complete tool for such integration.

- Is the completeness of all possible mutations provided for a specific code fragment? If not, how exactly are the mutations chosen?

- I personally did not check it, but I didn’t meet any problems with it. Stryker must generate all possible mutations for the same code fragment.

- I want to ask about snapshots. I have a unit test that tests the logic, including the layout of the snapshot react component. Naturally, if I change any logical construction, I immediately change the layout. This is the expected behavior, isn't it?

- Yes, that's their point, that you manually update snapshots yourself.

- So you somehow ignore snapshots in this report?

- Most likely, snapshots need to be pre-updated, and then run mutation testing, otherwise there will be a bunch of garbage from Stryker.

- Question about CI-servers. For just unit-tests, there are reporters - under GitLab, for anything you like, which show the percentage of successful passing of tests, and you can configure whether it is fay or not. And what about Stryker? It simply displays a sign in the console, but what next to do with it?

- They have an HTML-reporter, you can make your reporters, everything is flexibly customized. Perhaps there are some specific tools, but since we are still engaged in point mutation testing, I have not found any specific integrations with TeamCity and similar CI / CD tools.

- How much mutational tests increase support in general of tests that you have? That is, tests are a pain, and tests must be rewritten when the code is rewritten, etc. Sometimes it is easier to rewrite the code than tests. And then I also have mutation tests. How expensive is it for a business?

“First, I’ll probably correct that rewriting code for the sake of tests is wrong. The code should be easy to test. As for what needs to be added, this is again important for business, so that the tests are as complete and effective as possible. If they are not complete, it means that there may be a bug that will bring losses. Naturally, you can only test the most important parts of the business.

“Nevertheless — how much more expensive does it get when mutation tests appear, than if they were not there.”

“It's as expensive as bad tests are now.” If now the tests are written poorly, then you have to write a lot. Mutation testing will find cases that are not covered by tests.

- There are many Vorning on the slide with the results of the Stryker check, they are critical or not critical. How to handle false positives?

- A subtle question is what is considered false. I asked the guys in our team that they had something interesting out of such mistakes. There was an example about the text of the error. Stryker reported that the tests did not respond to the fact that the error text has changed. It seems to be a joint, but minor.

- So you see such errors and skip non-critical ones in manual mode?

- We have a spot check, so yes.

- I have a practical question. When you implemented it, what percentage of tests did you have?

- We did not implement the entire project, but there were minor problems on the new project. Therefore, I can not say the exact figures, but in general, the approach has definitely improved the situation.

You can watch other front-end speeches on our youtube channel , all thematic reports from all our conferences gradually get there. Or subscribe to the newsletter , and we will keep you updated on all new materials and news of future conferences.

Source: https://habr.com/ru/post/421141/

All Articles