How to stop worrying and start writing tests based on properties

Recently, references to a certain magic tool, property based testing (property based testing, if you need to google English-language literature), are becoming increasingly common. Most of the articles on this topic talk about what a cool approach this is, then they use an elementary example to show how to write such a test using some specific framework, at best, suggest several frequently encountered properties, and ... that’s all. Further, the amazed and enthusiastic reader tries to put all this into practice, and rests on the fact that the properties are somehow not invented. And unfortunately often gives up on this. In this article I will try to prioritize a little differently. I will start all the same with a more or less specific example to explain what kind of animal it is. But an example, I hope, is not quite typical for this kind of articles. Then I will try to make out some of the problems associated with this approach, and how they can be solved. But then - properties, properties, and only properties, with examples where they can be stuck. Interesting?

Test the key-value store in three short tests.

So, let's say for some reason we need to implement some kind of key-value storage. It can be a dictionary based on a hash table, or based on a tree, it can be stored entirely in memory, or be able to work with the disk - we do not care. The main thing is that it should have an interface that allows you to:

write value by key
check if there is an entry with the necessary key
read value by key
get list of recorded items
get a copy of the vault

With the classical approach based on examples, a typical test would look something like this:

storage = Storage() storage['a'] = 42 assert len(storage) == 1 assert 'a' in storage assert storage['a'] == 42

Or so:
')

 storage = Storage() storage['a'] = 42 storage['b'] = 73 assert len(storage) == 2 assert 'a' in storage assert 'b' in storage assert storage['a'] == 42 assert storage['b'] == 73

And in general, such tests can and should be written a little more than dofig. Moreover, the more complex the internal implementation, the greater the chance of something to miss. In short, a long, tedious and often ungrateful job. How nice it would be to shove her at someone! For example, to force the computer to generate test cases for us. To begin, let's try to do something like this:

 storage = Storage() key = arbitrary_key() value = arbitrary_value() storage[key] = value assert len(storage) == 1 assert key in storage assert storage[key] == value

Here is the first test based on properties. It looks almost the same as the traditional one, although a small bonus is already striking - there are no values taken from the ceiling, instead we use functions that return arbitrary keys and values. There is another, much more serious advantage - it can be performed many, many times and on different input data to check the contract, that if you try to add an element to an empty storage, it will really be added there. Okay, that's all well and good, but so far it looks not very useful compared to the traditional approach. Let's try to add another test:

 storage = arbitrary_storage() storage_copy = storage.copy() assert len(storage) == len(storage_copy) assert all(storage_copy[key] == storage[key] for key in storage) assert all(storage[key] == storage_copy[key] for key in storage_copy)

Here, instead of taking an empty repository, we generate an arbitrary one with some data, and check that its copy is identical to the original. Yes, the generator must also be written using a potentially buggy public API, but as a rule it is not such a difficult task. At the same time, if there are any serious bugs in the implementation, then the chances are high that the drops will start in the generation process, so this can also be considered a kind of bonus smoke-test. But now we can be sure that everything that could generate a generator can be correctly copied. And thanks to the first test, we know for sure that a generator can create a storage with at least one element. Time for the next test! At the same time reuse the generator:

 storage = arbitrary_storage() backup = storage.copy() key = arbitrary_key() value = arbitrary_value() if key in storage: return storage[key] = value assert len(storage) == len(backup) + 1 assert key in storage assert storage[key] == value assert all(storage[key] == backup[key] for key in backup)

We take an arbitrary storage, and check that we can add one more element there. So the generator can create storage with two elements. And you can also add an element to it. And so on (just remember such a thing as mathematical induction). As a result, the written three tests and the generator make it possible to reliably verify that an arbitrary number of different elements can be added to the storage. Only three short test! Here, in general, is the whole idea of tests based on properties:

find properties
check properties on a bunch of different data
profit!

By the way, this approach does not contradict the principles of TDD in any way - tests can also be written in the same way before the code (at least personally, I usually do this). Another thing is that getting such a test to become green can be much more difficult than the traditional one, but when it finally passes successfully, we will be sure that the code actually complies with a certain part of the contract.

This is all well and good, but ...

With all the attractiveness of the property-based testing approach, there is a whole bunch of problems. In this part I will try to make out the most common. And apart from the problem of the actual difficulty of finding useful properties (to which I will return in the next section), in my opinion the biggest trouble for beginners is often false confidence in good coverage. Indeed, we wrote several tests that generate hundreds of test cases - what can go wrong? If you look at the example from the previous part, then in fact a lot of things. For a start, the written tests offer no guarantee that storage.copy () will actually make a “deep” copy, and not just copy the pointer. Another hole is that there is no normal verification that key in storage will return False if the key you are looking for is not in the storage. And the list can still be continued. Well, one of my favorite examples - let's say we write sorting, and for some reason we consider that one test that checks the order of the elements is enough:

 input = arbitrary_list() output = sort(input) assert all(a <= b for a, b in zip(output, output[1:]))

And such an implementation of his excellent pass

 def sort(input): return [1, 2, 3]

I hope the moral here is clear.

The next problem, which in some sense can be called a consequence of the previous two - using property-based testing is often very difficult to achieve truly complete coverage. But in my opinion this is solved very simply - you do not need to write only tests based on properties, traditional tests have not been canceled. In addition, people are so arranged that it is much easier for them to understand things with concrete examples, which also speaks in favor of using both approaches. In general, I developed for myself something like the following algorithm - to write some very simple traditional tests, ideally, so that they could serve as an example of how you intend to use the API. As soon as there appeared a feeling that “for documentation” of tests is enough, but it’s still far from complete coverage to start adding tests based on properties.

Now to the question of frameworks, what to expect from them and why they are needed at all - after all, no one forbids them to run a test in a cycle, calling inside with random and enjoying life. In fact, the joy will be before the first drop of the test, and it is good if it is local, and not in some CI. First of all, since tests based on properties are randomized, we need a way to reliably reproduce the fallen case, and any self-respecting framework allows doing this. The most popular approaches are to output a certain seed to the console, which you can manually slip into the test runner and reliably reproduce the fallen case (convenient for debugging), or create a cache on the disk with “bad” sidami that will be automatically checked first ( helps with repeatability in CI). Another important aspect is data minification (shrinking in foreign sources). Since the data is generated randomly, there is a completely non-illusory chance to get on a falling test case with a container of 1000 items, which is to debug something else “fun”. Therefore, good frameworks, after finding a felling case, use a number of heuristics to try to find a more compact set of input data, which nevertheless will continue to fail the test. And finally - often half of the functionality of the test is an input data generator, so the presence of built-in generators and primitives that allow you to quickly assemble more complex simple generators also greatly help.

Also, there is a periodical criticism that tests based on properties are too much logic. However, this is usually accompanied by examples in the style

 data = totally_arbitrary_data() perform_actions(sut, data) if is_category_a(data): assert property_a_holds(sut) else if is is_category_b(data): assert property_b_holds(sut)

In fact, this is a fairly frequent anti-pattern (among newbies), don't do that! It is much better to break such a test into two different, and either skip unsuitable input data (in many frameworks there is even special means for this) if there is a small chance to get into them, or use more specialized generators, which will immediately issue only suitable data. The result should be something like

 data = totally_arbitrary_data() assume(is_category_a(data)) perform_actions(sut, data) assert property_a_holds(sut)

and

 data = data_from_category_b() perform_actions(sut, data) assert property_b_holds(sut)

Useful properties, and their habitats

Okay, how useful testing based on properties seems to be clear, the main pitfalls have been disassembled ... although not, the main thing is just unclear - where do you get these properties from? Let's try to search.

At least do not fall

The simplest option is to push arbitrary data into the system under test and check that it does not fall. In fact, this is a whole separate direction with the fashionable name fuzzing, for which there are specialized tools (for example AFL aka American Fuzzy Lop), but with some stretch this can be considered a special case of property-based testing, and if there are no ideas at all Do not climb, then you can start with it. However, as a rule, such tests rarely make sense, since potential failures are usually very good at testing other properties as well. The main reasons why I mention this “property” are to refer the reader to fuzzers and in particular AFL (there are a lot of English-language articles on this topic), well, for completeness.

Test oracle

One of the most boring properties, but in fact a very powerful thing that can be used much more often than it might seem. The idea is that sometimes there are two pieces of code that do the same thing, but in different ways. And then you can especially without understanding generating arbitrary input data, shove them in both variants and check that the results are the same. The most frequently cited example of application is to leave a slow but simple option when writing an optimized version of a function and drive tests against it.

 input = arbitrary_list() assert quick_sort(input) == bubble_sort(input)

However, the applicability of this property is not limited to this. For example, it often turns out that the functionality implemented by the system we want to test is a superset of something already implemented, often even in the standard language library. In particular, usually most of the functionality of some key-value storage (in memory or on disk, based on trees, hash tables or some more exotic data structures such as merkle patricia tree) can be tested with a standard standard dictionary. Testing any CRUDs - there too.

Another interesting application that I used personally - sometimes when implementing a numerical model of a system, some special cases can be calculated analytically, and the results of the simulation can be compared with them. In this case, as a rule, if you try to shove completely arbitrary data at the input, even if correctly implemented, the tests will still fall due to the limited accuracy (and therefore applicability) of the numerical solutions, but in the process of repairing by imposing restrictions on the generated input data, these same restrictions become known.

Requirements and Invariants

The basic idea here is that often the requirements themselves are formulated in such a way that they are easy to use as properties. In some articles, invariants are separately distinguished on such topics, but in my opinion, the border is too fragile here, since most of these invariants are direct consequences of the requirements, so perhaps I’ll dump it all together.

A small list of examples from a variety of areas that are convenient for checking properties:

class field must have a previously assigned value (setters)
the repository should be able to read the previously written item
adding a previously non-existing item to the repository does not affect previously added items
in many dictionaries can not be stored several different items with the same key
the height of the balanced tree should be no more $K \ cdot log (N)$ where $N$ - number of elements recorded
the result of sorting is a list of ordered elements.
the base64 encoding result must contain only characters from the base64 alphabet
the route construction algorithm must return a sequence of allowable movements that will lead from point A to point B
for all points of the contour constructed should be performed $f (x, y) = const$
the signature verification algorithm should return True if the signature is real and False otherwise
as a result of orthonormalization, all vectors in the basis must have unit length and zero reciprocal scalar products
vector transfer and rotation operations should not change its length

In principle, one could say that everything is complete, use test oracles, or look for properties in the requirements, but there are some more interesting "special cases" that I would like to mention separately.

Stateful testing and state testing

Sometimes you need to test something that has a state. In this case, the easiest way:

write a test that checks the correctness of the initial state (for example, the container just created is empty)
write a generator that using a set of random operations will lead the system to some arbitrary state
write tests for all operations using the result of the generator as the initial state

Very similar to mathematical induction:

prove claim 1
prove the assertion N + 1, assuming that the assertion N is true

Another method (sometimes giving a little more information about where it broke) is to generate a valid sequence of events, apply it to the system under test and check the properties after each step.

Back and forth

If suddenly there is a need to test a couple of functions for direct and inverse transformation of some data, then consider yourself lucky:

 input = arbitrary_data() assert decode(encode(input)) == input

Great for testing:

serialization-deserialization
encrypt-decrypt
encoding decode
transformations of the basic matrix into quaternion and back
direct and inverse coordinate transformation
direct and inverse Fourier transform

A private, but interesting case is the inversion:

 input = arbitrary_data() assert invert(invert(input)) == input

A striking example is the inversion or transposition of a matrix.

Idempotency

Some operations do not change the result when reapplied. Typical examples:

sorting
all sorts of vectors and bases
re-add an existing item to the set or dictionary
re-write the same data in some property of the object
reduction of data to canonical form (spaces in JSON lead to a single style for example)

Also idempotency can be used to test serialization-deserialization if the usual decode (encode (input)) == input method is not suitable because of the different possible representations for the equivalent input data (again, there are extra spaces in some JSON):

 def normalize(input): return decode(encode(input)) input = arbitrary_data() assert normalize(normalize(input)) == normalize(input)

Different paths, one result

Here, the idea boils down to exploiting the fact that sometimes there are several ways to do the same thing. It may seem that this is a special case of a test oracle, but in fact it is not quite so. The simplest example is the use of commutativity of some operations:

 a = arbitrary_value() b = arbitrary_value() assert a + b == b + a

It may seem trivial, but it's a great way to test:

addition and multiplication of numbers in a nonstandard representation (bigint, rational, that's all)
“Adding” points on elliptic curves in finite fields (hello, cryptography!)
union of sets (which can have completely non-trivial data structures inside)

In addition, adding the elements to the dictionary has the same property:

 A = dict() A[key_a] = value_a A[key_b] = value_b B = dict() B[key_b] = value_b B[key_a] = value_a assert A == B

The variant is more complicated - I thought for a long time how to describe in words, but only a mathematical notation comes to mind. In general, such transformations are common.

f (x)

$f (x)$ for which the property is executed

f (x + y) = f (x) c d o t f (y)

$f (x + y) = f (x) \ cdot f (y)$ , and both the argument and the result of the function are not necessarily just a number, but operations

+

$+$ and

c d o t

$\ cdot$ - just some binary operations on these objects. What can this test:

addition and multiplication of any strange numbers, vectors, matrices, quaternions ( $a \ cdot (x + y) = a \ cdot x + a \ cdot y$ )
linear operators, in particular, any integrals, differentials, convolutions, digital filters, Fourier transforms, etc. ( $F [x + y] = F [x] + F [y]$ )
operations on the same objects in different representations, for example
- $M (q_a \ cdot q_b) = M (q_a) \ cdot M (q_b)$ where $q_a$ and $q_b$ Are single quaternions, and $M (q)$ - the operation of converting a quaternion into an equivalent basic matrix
- $F [a \ circ b] = F [a] \ cdot F [b]$ where $a$ and $b$ - these are signals $\ circ$ - convolution $\ cdot$ - multiplication, and $F$ - Fourier transform

An example of a slightly more “routine” task is to test some tricky dictionary merge algorithm in some way:

 a = arbitrary_list_of_kv_pairs() b = arbitrary_list_of_kv_pairs() result = as_dict(a) result.merge(as_dict(b)) assert result == as_dict(a + b)

Instead of conclusion

That's basically all that I wanted to tell in this article. I hope it was interesting, and a little more people will begin to put all this into practice. To make the task a little easier, I will give a list of frameworks of different degrees of fitness for different languages:

Python: Hypothesis
Rust: Proptest
Scala: Scalacheck
C ++: Rapidcheck
JavaScript: FastCheck
Go: Gopter

And, of course, a special thank you to people who once wrote wonderful articles, thanks to which I learned about this approach a couple of years ago, stopped worrying and started writing tests based on properties:

Source: https://habr.com/ru/post/434008/

All Articles