📜 ⬆️ ⬇️

Code coverage in Badoo

A few months ago, we accelerated code coverage generation from 70 to 2.5 hours. It was implemented as an additional format in the export / import coverage. And recently, our pull requests were included in the official repositories of phpunit, phpcov and php-code-coverage.

We have repeatedly told at conferences and in articles that we "drive" tens of thousands of unit tests in a short time. The main effect is achieved, as is easy to guess, due to multithreading. And everything would be fine, but one of the important testing metrics is covering the code with tests.
Today we will tell how to count it under multithreading conditions, aggregate it and do it very quickly. Without our optimizations, coverage calculation took more than 70 hours only for unit tests. After optimization, we spend only 2.5 hours on calculating coverage for all unit tests and two sets of integration tests for a total of more than 30 thousand.

We write tests in Badoo in PHP, use the PHPUnit Framework from Sebastian Bergman (Sebastian Bergmann, phpunit.de ).
Coverage in this framework, as in many others, is considered using the Xdebug extension simple calls:

xdebug_start_code_coverage(); //…    … $codeCoverage = xdebug_get_code_coverage(); xdebug_stop_code_coverage(); 

The output is a nested array containing the files that were running during the collection of coverage, and the line numbers in files with special flags: whether the code was called, was not, or should not have been called at all. Details about the work of Xdebug with the coating can be read on the project website .
')
Sebastian Bergman has a library called PHP_CodeCoverage , which is responsible for collecting, processing and displaying coverage in various formats. The library is convenient, expandable and we are quite satisfied. She has a phpcov console front end .
But the PHPUnit call itself has already integrated the calculation of coverage and output in various formats for convenience:

  --coverage-clover <file> Generate code coverage report in Clover XML format. --coverage-html <dir> Generate code coverage report in HTML format. --coverage-php <file> Serialize PHP_CodeCoverage object to file. --coverage-text=<file> Generate code coverage report in text format. 

The --coverage-php option is what we need for a multithreaded start: each thread counts the coverage and exports it to a separate * .cov file. Aggregation and output to a beautiful html report can be done by calling phpcov with the - merge flag.

 --merge Merges PHP_CodeCoverage objects stored in .cov files. 

Everything goes smoothly, beautifully and should work out of the box. But, apparently, not everyone uses this mechanism, including the author of the library, otherwise the “non-optimality” of the export-import mechanism used in PHP_CodeCoverage would quickly resurface. Let's sort in order what's the matter.

For export to the * .cov format, there is a special class reporter PHP_CodeCoverage_Report_PHP , whose interface is very simple. This is a process () method that accepts a PHP_CodeCoverage class object as input and serializes it with the serialize () function.

The result is written to the file (if the path to the file is passed), or returned as the result of the method.

 class PHP_CodeCoverage_Report_PHP { /** * @param PHP_CodeCoverage $coverage * @param string $target * @return string */ public function process(PHP_CodeCoverage $coverage, $target = NULL) { $coverage = serialize($coverage); if ($target !== NULL) { return file_put_contents($target, $coverage); } else { return $coverage; } } } 

The phpcov import utility, on the contrary, takes all the files in a directory with the * .cov extension and for each does unserialize () into the object . The object is then passed to the merge () method of the PHP_CodeCoverage object, into which the coverage is aggregated.

  protected function execute(InputInterface $input, OutputInterface $output) { $coverage = new PHP_CodeCoverage; $finder = new FinderFacade( array($input->getArgument('directory')), array(), array('*.cov') ); foreach ($finder->findFiles() as $file) { $coverage->merge(unserialize(file_get_contents($file))); } $this->handleReports($coverage, $input, $output); } 

The merging process itself is very simple. This merge of array_merge () arrays with small nuances, such as ignoring what was already imported, or passed as a filter parameter to the phpcov call (--blacklist and --whitelist).

  /** * Merges the data from another instance of PHP_CodeCoverage. * * @param PHP_CodeCoverage $that */ public function merge(PHP_CodeCoverage $that) { foreach ($that->data as $file => $lines) { if (!isset($this->data[$file])) { if (!$this->filter->isFiltered($file)) { $this->data[$file] = $lines; } continue; } foreach ($lines as $line => $data) { if ($data !== NULL) { if (!isset($this->data[$file][$line])) { $this->data[$file][$line] = $data; } else { $this->data[$file][$line] = array_unique( array_merge($this->data[$file][$line], $data) ); } } } } $this->tests = array_merge($this->tests, $that->getTests()); } 

It was the use of the serialization and deserialization approach that became the very problem that prevented us from quickly generating coverage. More than once the community has been discussing the performance of the serialize and unserialize functions in PHP:
http://stackoverflow.com/questions/1256949/serialize-a-large-array-in-php ;
http://habrahabr.ru/post/104069 , etc.

For our small project, the PHP repository of which contains more than 35 thousand files, the files with coverage weigh a lot, several hundred megabytes each. The total file, "stern" from different streams, weighs almost 2 gigabytes. On such data volumes, unserialize showed itself in all its glory - we waited for the generation of coverage for several days.

Therefore, we decided to try the most obvious way to optimize - var_export and the following include files.

To do this, a new reporter class has been added to the php-code-coverage repository, which makes export in the new format via var_export:

 class PHP_CodeCoverage_Report_PHPSmart { /** * @param PHP_CodeCoverage $coverage * @param string $target * @return string */ public function process(PHP_CodeCoverage $coverage, $target = NULL) { $output = '<?php $filter = new PHP_CodeCoverage_Filter();' . '$filter->setBlacklistedFiles(' . var_export($coverage->filter()->getBlacklistedFiles(), 1) . ');' . '$filter->setWhitelistedFiles(' . var_export($coverage->filter()->getWhitelistedFiles(), 1) . ');' . '$object = new PHP_CodeCoverage(new PHP_CodeCoverage_Driver_Xdebug(), $filter); $object->setData(' . var_export($coverage->getData(), 1) . '); $object->setTests(' . var_export($coverage->getTests(), 1) . '); return $object;'; if ($target !== NULL) { return file_put_contents($target, $output); } else { return $output; } } } 

We modestly called the file format PHPSmart. The extension of the files of this format - * .smart.

In order for the object of the class PHP_CodeCoverage to allow itself to be exported and imported into the new format, setters and getters of its properties were added.
A few edits in the phpunit and phpcov repositories so that they learn how to work with such an object, and our coverage began to assemble in just two and a half hours.
Here is the import:

  foreach ($finder->findFiles() as $file) { $extension = pathinfo($file, PATHINFO_EXTENSION); switch ($extension) { case 'smart': $object = include($file); $coverage->merge($object); unset($object); break; default: $coverage->merge(unserialize(file_get_contents($file))); } } 

You can find our edits on GitHub and try this approach on your project.
github.com/uyga/php-code-coverage
github.com/uyga/phpcov
github.com/uyga/phpunit

We sent Sebastian Bergman a pull request for our edits, hoping to see them in the creator’s official repositories soon.
github.com/sebastianbergmann/phpunit/pull/988
github.com/sebastianbergmann/phpcov/pull/7
github.com/sebastianbergmann/php-code-coverage/pull/185

But he closed them, saying that he wanted not an additional format, but ours instead of his:



What we happily did. And now our changes are included in the creator’s official repositories, replacing the format used previously in the * .cov files.
github.com/sebastianbergmann/php-code-coverage/pull/186
github.com/sebastianbergmann/phpcov/pull/8
github.com/sebastianbergmann/phpunit/pull/989

Such a small optimization helped us to speed up the collection of coverage almost 30 (!) Times. She allowed us to drive not only unit tests to calculate coverage, but also add two sets of integration tests. This did not significantly affect the time of import-export and merge results.

PS:


Ilya Ageev
QA Lead

Source: https://habr.com/ru/post/192538/


All Articles