Why unit tests do not work in scientific applications

In this article, I want to share my experience in the development of scientific applications, and tell why Test-Driven Development and unit tests are not a panacea, as is commonly believed lately, at least in terms of finding software bugs. Why so?

The article will consist of several sections:

“Introduction” in which I will tell why I decided to write it
"An illustrative algorithm." A scientific problem and an algorithm that will illustrate the following discussion will be briefly described here.
“Unit Test Disadvantages” Here are arguments explaining the unit test deficiencies for scientific applications.
Conclusion

Immediately, I note that I do not see the benefits of TDD as a way of developing an architecture, but only from the point of view of testing the program.

Introduction

During the week of ballooning on Habré, the main article was Digital Planes [1], and an entertaining commentary to it [2] that 20 years ago there were no unit tests and generally normal development methods, so the “triple-triple redundant architecture ”, in particular, writing the same program by three different teams in different languages in order to eliminate program errors.

However, according to eyewitness accounts ([3], by the way, this is another excellent book by an excellent scientist Richard Feynman, who has found a second life in Habré), if not Test Driven Development (TDD), then manual testing was a very important development stage.
That is why I had a desire to analyze the approach to the development of scientific applications and to share thoughts on this subject, in particular, to say that TDD is not a silver bullet to search for errors, as one might think at first.
')

Illustrative algorithm

In the last 20 years, an unusual method for modeling hydrodynamics has been developed: Lattice-Boltzmann Method (hereinafter - LBM) [4], [5]. I participated in the implementation of this algorithm in C ++ for the Max Planck Institute in Germany.

To avoid inevitable questions: yes, commercial and open packages already exist that use this method [6]; but initially it was planned to use a modification of the method, which was supposed to give a gain in performance and memory consumption; nevertheless, we have consistently refused it (first for mass transfer, then for heat transfer). By that time a lot of code was written, and we decided to continue with our own implementation.

Hydrodynamics, as is known, is described by the Navier-Stokes equation. When modeling it, you can use, for example, the Finite Element Method [7]. Unfortunately, it has drawbacks - the relative complexity of parallelization, the impossibility of modeling multiphase flows, the difficulty of modeling flows in porous media. These deficiencies are devoid of LBM.

For rarefied gases, the Boltzmann equation [8] is valid, which describes how the density of the particle velocity distribution changes at each point in space with time. Macroscopically, this equation is equivalent to the Navier-Stokes equation (that is, after the transition to macroscopic quantities — density and velocity).

Now watch your hands: in spite of the fact that for dense liquids, the Boltzmann equation is not applicable, if we learn to model it, then we can also model the Navier-Stokes equation for these fluids. That is, we thereby replace the underlying level of abstractions (microscopic equation for dense liquids) with a physically incorrect implementation (microscopic equation for rarefied gases), but so that the upper level (the macroscopic Navier-Stokes equation) is described correctly.

The next stage is the discretization of the Boltzmann equation: by time, by spatial coordinates (we obtain spatial nodes for modeling) and by possible directions of particles in each spatial node. Directions are chosen in a special way, and always point to some neighboring nodes.

Thus, these are the features that are critical for further examples: 1. The algorithm uses complex formulas 2. The algorithm is nonlocal, that is, in order to calculate the state of the system in a given node, it is necessary to use information from the previous step about neighboring nodes.

Disadvantages of unit tests

Finally, we turn to why unit tests cannot fully cover a complex scientific application.
The main idea of the section is that complex algorithms interfere with testing ... yourself.

Lack of simple tests

For complex algorithms and methods it is hard to come up with simple tests.

This, in general, is obvious. It is often difficult to choose simple input parameters of the formula so that the output will get a simple and understandable number. That is, you, of course, separately on a piece of paper or in Matlab, you can calculate what parameters you need to submit to the input in order to get a unit at the output of the test method; but the input parameters and the output unit will be more like magic numbers. An example is the Maxwell distribution [9]. In the LBM algorithm, it is used in a slightly modified form.

Small coverage area

Even if you manage to come up with a simple test for a complex formula or algorithm, then this test most likely will not find most of the errors.

Indeed, sometimes a simple and easy-to-read test can be thought up: for example, if you are testing, as you wrote the rotation matrixes in your 3D engine (I think many inhabitants of Habr indulged in this), then turning to zero angles along all axes should return the unit matrix. However, often these tests do not detect most errors. For example, if instead of a sine of a rotation angle (in the elements of a rotation matrix) you begin to use the tangent, then such a simple test will pass.

Another example: you need to implement the boundary conditions for the LBM algorithm. These conditions should complement the state of the node when the latter is not completely surrounded by “liquid” neighbors (for example, it is located at the wall). You can also come up with a simple test - if all the neighbors are in equilibrium states (that is, the speeds are distributed according to Maxwell law), then after the iteration the boundary node will also be in equilibrium. However, it turned out that this test misses most of the errors ...

One could argue that in business applications, tests are also not intended to cover all possible input parameters. However, the difference of scientific applications is that they usually operate with sets of other powers [10]. Business applications mainly work with finite or countable sets (of finite power or power of alef-null: boolean variables, strings, integers), and scientific ones with uncountable sets (real or complex numbers).

Slow error rate

Due to the fact that typical sets are uncountable, it is difficult to immediately detect the error.

The fact is that if you mix up “or” and “and” in a method that uses only Boolean variables, then you will immediately get something wrong at its output. And if you mix up plus and minus in a formula that works with small values of input parameters, then you may well not detect the error after one method call with this formula. You will need to call this method several hundred thousand times to detect a discrepancy. It is also worth remembering that in a test that works with real numbers, you need to think about how to avoid false positives.

Example: in one of the important cycles of the LBM algorithm, continue was confused with break. However, all tests passed. An error was detected only after manual verification of the algorithm on a simple system — the Poiseuille flow [11], when after 100,000 iterations, or 5 minutes (!) Of modeling a 32x64 node system (starting from a stationary state), the velocity distribution turned out to be somewhat asymmetric (the maximum relative error was about 1e -ten).

Lack of modularity

For many methods it is impossible to create unit tests.

Complex physical modeling algorithms are rarely local. That is, each spatial modeling node can interact with neighboring nodes, and you can not avoid it, and this makes it impossible to unit test.

For example, to test a typical LBM algorithm step or typical boundary conditions in a two-dimensional space, a system of at least 3x3 nodes is required. They need to specify densities, speeds, temperatures, carry out preliminary calculations, set the modeling context (such as various providers of physical properties and models), etc. As a result, the modularity of the test is gradually hidden in the haze of integration.

Another (not so characteristic) example: if you need to test the collision algorithm in the physics engine of the game, then you must specify at least two bodies that will collide, their coordinates, speeds, geometry, etc., which again turns the unit test into an integration test.

Large facades

If the algorithm is nontrivial, then one public method of the class implementing this algorithm can hide 10-15 methods and 300-400 lines of complex code [12].

Even so: 300-400 lines of austere, ruthless, unresponsive code. In this class there can be 5 variables of the type a, b, gamma, calls to the methods of the LAPACK libraries [13], BLAS, generation of random numbers and who knows what else. And with all your heart you may wish to call the variable “a” somehow normal, because you know that when developing a corporate application you would have been scolded for such a name for a long time, but you cannot, because it is so called in the original article, and because does not have any obvious meaning. The best thing you can do is provide a link to the article at the beginning of the class.

And even if you come up with a simple unit test, with simple input parameters for this only public method, and with a simple output value, and it stops to pass at one point, then you will not be able to find an error in this class even by selfless debag or the ultimate method attentive reading.

After a collision with such a situation, the best solution was to rewrite this algorithm on MATLAB and compare the calculations line by line.

Conclusion

I hope I managed to bring sufficiently strong arguments and convince the readers of Habr that often writing the same program in different languages by different teams is an excellent and (in the case of vital systems) inevitable addition to unit testing, TDD, Interface Driven Development, Agile and other modern techniques.

Thanks for reading!

Links

[1] Digital airplanes
[2] Comment on the article Digital Airplanes
[3] What do you care about what others think of you , the chapter "The long-suffering application"
[4] Lattice Boltzmann method
[5] Books on LBM . Some are available through google books
[6] Parallel Lattice Boltzmann Solver
[7] Finite element method
[8] Boltzmann equation
[9] Maxwell-Boltzmann distribution
[10] Cardinality
[11] Poiseuille Stream
[12] Pattern Facade
[13] LAPACK

PS I guess that for most readers of the site, the algorithm is somewhat exotic, however, I think this is the best illustration. In the end, where do without scientific algorithms in the article about them the most.

PPS Links to scientific articles, I do not cite, because they are all available only by subscription.

PPS Thanks to SychevIgor user for invite!

UPD

After reading the comments, I think that it is necessary to get rid of the phobia of magic numbers and write for all modules (like BoltzmannDistributionProvider) table tests for comparison with calculations from Matlaba or calculations of the running state of the application. And also add similar system tests (for example, calculate the dynamics of a 5x5 node system once, write to a file, and compare the results of a run with this file each time). Thank you all for the tips!

Source: https://habr.com/ru/post/92038/

All Articles