Details of the test-first, which is so lacking

We all have heard about test-first, a development philosophy that encourages writing tests before code. I am sure that anyone who tried to use this method in practice, faced with the fact that he simply can not write a test before the function (usually in this case just ignore this problem and violate the test-first locally). I believe that the reason for such failures is fundamental, and I will try to show why.

To begin with, it should be clarified that hereinafter I will speak about testing a function in the broad sense of the word as testing some conditional primitive code unit. Let us leave aside the question of what kind of similar unit should be tested (for example, a method or a class); these details will not affect the further course of reasoning. I will use the expression "function testing" in this sense throughout the article.

It may seem to you that the industry has long figured out all the problems associated with the test-first, and the reason for all possible failures is only that we, as developers, do not have sufficient qualifications to successfully use the necessary techniques, and not at all some fundamental problems . Alas, here and there different programmers ask the same questions, how exactly to do test-first, and get sometimes unintelligible answers. I think, without exaggeration, we can say that the community around the world suspects something, but a lot remains unsaid.

Red-green-refactor is not enough to actually use as a workflow. In real life TDD is more like this: pic.twitter.com/TuGxzrQQkg
- Sarah Mei (@sarahmei) September 4, 2015

')
Let's try to understand the problems that can fundamentally prevent us from being guided by the test-first in the form in which it is usually stated. Plan our reasoning:

In general, it is possible to write a test for a function in advance only if we consider it as a black box.
In the general case, the function as a black box should not be considered (or even impossible) in the test.
From points 1 and 2 it immediately follows that in the general case it is not necessary (or even impossible) to write a test for a function in advance.
What to do?

1. The unwritten function can only be tested as a black box.

The term test-first is closely related to another, much more popular today: TDD . I will not dwell on the differences of one technique from another, suffice it to say that test-first is an integral part of TDD (although it can also be used separately from it). Further in the article I will talk about test-first, keeping in mind, however, that everything said with minimal clarification is also true for TDD.

At the moment when test-first suggests that I write a test for a function, all I know about it is its interface. It may not be final, but in order to begin development, it is supposed to decide on at least some of its versions. Traditionally, you can consider two main parts of the interface: input and output. But it should be understood that for a function, the input data is not only the parameters with which it is called, but the output is not only what it returns directly. A function can have several technical ways to return values: for example, the usual return , exceptions, and writing to parameters (this can all be called differently). In addition, the state of the system under test can also act as input and output data. (The simplest example of such interaction with the state of the system can be a function that manipulates global variables. Despite its degeneracy, this situation is not exceptional: the interaction point can be the objects that the method is called from, singletones, global pools, databases - in any form etc.)

So, all the knowledge that is available to me about the function before writing a test is its interface in the broad sense of the word. And in the context of this context I put the knowledge not only about how the function works with the data, but also about what it does with them (I, of course, know the purpose of the function). Obviously, tests written with only the interface taken into account are black box testing: the internal logic and function source code are not available to me, at least simply because I have not developed this logic and have not yet written the code.

Perhaps you believe that testing the black box is a great idea, and this is exactly what we need, and we would use this technique even if we already had the code. However, in the next section, I will try to show what potential problems this poses and why the widespread use of such an approach may be inappropriate. Despite the fact that this may seem obvious, a certain formalization here does not hurt at all.

2. The unwritten function should not be tested as a black box.

One of the main tasks facing the programmer who writes the test is the selection of input data on which the function should be checked.

As a rule, in modern industrial development, theoretical proof of the loyalty of programs (a) is practically impossible and (b) is not required. Much of the confidence that a program does what it should is based on a certain hypothesis, which the programmer puts into a simpler set of hypotheses with the help of tests, an informal understanding of which would be more accessible.

What I mean? Most often, it is obvious to me that a function behaves identically on some subset of the input data space, which is usually called the "equivalence class". Saying “obviously”, I mean the very hypothesis on which my belief is built that my program works as it should (in fairness it should be noted that this is a common problem in all engineering disciplines: some things have to be done). In the absence of any hypotheses, any testing would be useless; only formal proof (which, I repeat, on the verge of the impossible) would help me.

But if there is such a hypothesis about the presence of equivalence classes, it is enough for me to test the function of the function on only one input data in order to make sure that it functions correctly throughout the class. So, the analysis of the splitting of all possible input data into equivalence classes, in order to select test examples for them, is one of the main tasks that the author of the test faces.

But when testing with the black box method, ignoring the possible presence of equivalence classes, you cannot choose these examples in any normal way - there will be either too many of them, and extra work will be done, or too little, and testing will not be completed.

How will knowledge of code affect our choice of equivalence classes? Two main ways: we can (a) use the fact that part of the code has already been tested, and (b) analyze the details of the algorithm.

Let's talk about each of them in more detail.

With a black box it is not known what has already been tested.

Let me start with an example. I'm going to write the function number_of_german_letters(str) , which returns the number of letters of the German alphabet contained in the string str .

This task, by the way, is not as simple as it may seem. The German alphabet contains all Latin letters (A — Z), three letters with umlauts (Ä, Ö, Ü) and an etcet (ß) ligature. Here are at least a few things that you can forget to think about: letters with umlauts in Unicode are present both in the form of independent symbols, and in the form of a combination of the Latin letter symbol and the umlaut symbol. The letter ß has only a small-line outline (if a word with ß is written in capital letters, then it is replaced with SS), but in Unicode 5.1 there is a capital pattern: ). I am sure that I didn’t take something into account (for example, I just don’t know if the old version of the ligature — ſs — is being used and whether it should be considered German).

The question immediately arises: do I have a function that checks whether the letter belongs to the German alphabet (for example, is_german_letter )? If there is and I will use it in number_of_german_letters , then I will not need to re-check the recognition of German letters. It is necessary to check only the code that considers the German letters: the fact that he correctly recognizes them, already "proves" the test for is_german_letter . Rechecking is not only useless, but also, most likely, harmful.

If re-checking does not seem harmful to you, here are a few arguments that can convince you:

If I recheck the work of the is_german_letter function in number_of_german_letters , then, logically, I should recheck still lower-level functions, including the library for working with unicode, the meaninglessness of which is more obvious.
Exactly the same logic is true when using number_of_german_letters itself in higher-level functions, including those that give this data to the user in the form of a picture (for example), which will be a waste of energy. The fallacy of this approach is particularly noticeable when it comes to the function wrapper, which adds little or nothing. If I get the functions number_of_german_letters_int , number_of_german_letters_float and number_of_german_letters_str , then I’ll have to repeat all the tests for number_of_german_letters in each of them (well, if we follow the same logic, for all the more low-level functions, I can also draw the number_of_german_letters_int , and the number_of_german_letters_str will also have the numbers of the number_of_german_letters_int , and the number_of_german_letters_str will also be the numbers of the number_of_german_letters_int , and the number_of_german_letters_float also be the number_of_german_letters_str , and if you follow the same logic.
Although the counterargument can be that the full retesting has its own bonus. In conditions when each test rechecks the function as a whole, without knowing about its dependencies and its internal structure, the test really informs whether this function works. Under the conditions of testing only the new one that the function introduces, any red test may indicate the inoperability of any other function (since the dependencies are unknown, and as many as others may depend on the “reddened” function). However, usually this interpretation of the results is quite enough and this is not a problem - you can just fix first what broke. Full retesting at every level is not worth it.

However, I remind you that we are dealing with black box testing, which means that we do not know whether the is_german_letter function will be used. But this knowledge plays a crucial role in the selection of the input data sets discussed above. If is_german_letter used, the strings abc1Ö and abc1ß actually test the same, that is, they represent the same set of input conditions (equivalence within which is postulated by my hypothesis). However, if the number_of_german_letters determines the “Germanity” of the letters independently, it is quite possible that these lines test different aspects of the function.

It is also completely unknown whether this function will work correctly with Unicode: since this is a black box, I cannot be sure that any ready-to-use, tested library will be used for working with Unicode! You need to check how the function behaves on various non-valid Unicode sequences, for example.

So, testing the function as a black box, I have to repeat the tests already done again and again. Yes, there are some features that I subconsciously trust (such as a library for working with Unicode, for example), but this trust does not have a clear framework. It is worth adding that it is advisable to formulate a testing task not as “check everything that a function does”, but “check only the logic that it introduces ”. If the function can only count German letters, then the test should check its ability to read letters. True, I would also like to make sure that it calls the correct function to determine the “Germanity” of letters (that is, to check the integration), but for this it is usually enough to have one test, rather than a full retest. (There is a theoretical justification for this: with this approach, the function still has one equivalence class, which we confirm with one test.)

All this greatly limits my ability to test using the black box method. We will talk about how to solve this problem in the fourth part, but first consider another factor that impedes such testing.

Unknown algorithm details

Obviously, when testing a black box, we don’t have knowledge of the algorithms used inside it (that's why it is a black box). It seems that this does not always interfere with the choice of data for testing: sometimes at least some of them can be selected on the basis of the formulation of the problem. But this is a false impression: all such considerations may turn out to be incorrect with different implementations of the functional. Whether there is a branch in the function code or not, whether libraries are used or not — this all influences which test data you need to select.

An interesting example is optimization. The function code may work in a different way on values that appear to be uniform. For example, I can multiply by 2 ⁿ on a binary processor with the help of shift operation, not multiplication: this optimization makes separate checks necessary, but the formulation of the problem itself (multiplication of two numbers) in no way outlines the power of two. Sometimes the exclusivity and heterogeneity of those or other values can be completely unclear before implementation.

To be fair, it should be noted that optimization can also be viewed as a separate feature that can be added on a separate iteration, with its own test-first. And yet one should not think that an unexpected jump in values is an exceptional rarity. Two more vivid examples come to my mind:

When setting the lifetime of a value in memcached, any value greater than 60*60*24*30 is considered the number of seconds since the beginning of the UNIX epoch, and the rest is the number of seconds from the current moment .
In Ruby, strings of up to 23 characters are stored in memory differently than those that are longer.

It may seem to you that this and the previous paragraph have much in common with each other, and you will be right: these are different manifestations of the same problem. The first paragraph deals mainly with tests that seem to be necessary but not needed, while the second, on the contrary, deals with tests that seem unnecessary but necessary.

3. Conclusions from points 1 and 2

So, in point 1, I tried to show that test-first inevitably forces us to deal with testing the black box. In paragraph 2, it describes the fundamental and intractable problems that arise when testing a black box. If the writing in clauses 1 and 2 is correct, it is necessary to recognize that test-first is generally associated with problems that we have no way to avoid.

What to do? In the next paragraph, we will talk about possible test-first modifications that will help us get around these problems (since we have no fundamental way to solve them). It should also be noted that, although what has been said is primarily applicable to unit-tests of a function, this is also true for integration testing (which is often the case with the black box).

4. How to work with test-first

So, we are faced with the task of a certain refinement of test-first, which would help us get around the problems mentioned in the previous paragraphs. However, these modifications, if possible, should not deprive us of the advantages and bonuses that we would like to receive from the test-first.

What does test-first give us? This is quite an extensive topic, different authors point to different advantages, a comparative analysis of which is beyond the scope of this article, so I’ll just give you a non-exhaustive list:

Test-first helps the programmer to more clearly understand what the function should do, even before he started writing it.
The programmer can also test the interface before the start of the implementation and, perhaps, find some problems in it at an early stage, when it is almost worthless to abandon it.
You will most likely not write untested or poorly testable code using test-first.
Test-first as a whole disciplines the development: with it you will not have the opportunity to “forget” about some tests, and you will most likely not even think of writing functions of 500 lines (to test such a function, as a rule, is monstrous time-consuming task).

Next, I will give a set of techniques that I use in my daily work and which allow me to combine the charm and benefits of the test-first, avoiding, however, the negative consequences discussed in the preceding paragraphs. They are based on the fact that the concept of "test" includes both components that can be written before the code, and those that can not. They must be separated from each other by introducing another level of abstraction.

Write test pattern before function

We have already decided that there is no way to write an exhaustive test suite before the code, so my solution is this: I create a special test pattern.

Let's talk more about what I call this pattern. Strictly speaking, any test can be represented as a code that iterates over a set of pairs ( IN, OUT ) and verifies that with the input data, the IN function returns the OUT output. This set of pairs will be referred to as the table. Let me remind you that we are talking about the interface of the function in the broad sense of the word (see clause 1). In practice, IN and OUT can be as complex as you like, but in our example with number_of_german_letters these are likely to be pairs ( source_string, letter_number ). So, considering all that was said in the previous paragraphs, it is difficult to create a table before writing the code, but I can write the code that will check the next pair of IN and OUT , knowing only the interface. Or, speaking less formally, I can choose what and how exactly I want to check, but I still can not know exactly what values.

If you look at the test as a code serving a similar table, then before the function I can already write this code, but I still can not fill the table with data. That is, I repeat, I already know how to set the initial parameters of the function and how to remove the result of its work from the system under test, but I still do not know which of these pairs to include in my test. What does such a test look like in practice? Applying this approach to our number_of_german_letters function, if there is an internal function defining “Germanity,” we can get away with a small number of rows in this table, whereas, if not, the table will have to be filled with much more. But the test pattern in both cases will be the same.

So, under the test pattern, I mean such a code, in which it remains only to put pairs ( IN, OUT ). You can start, for example, by placing all your checks inside a loop on an empty table, then all the code will be ready, but not yet executed (since the loop does not iterate). And although this type most closely matches the idea under discussion, in practice I practically do not use it. Instead, I usually use this idea in a somewhat simplified form.

Instead of looping over the table, I simply write code that corresponds to one iteration with one pair of values. What values do I choose? This does not play a fundamental role, since one such pair will never be redundant. IN will always belong to some set of values on which I am going to test, I choose one set or a dozen as a result, as in the aggregate all sets must cover all the values anyway. And even in the case of a primitive wrapper, this single check will be useful because it will verify the integrity of integration with other functions. When I need to extend the test to check more than one pair ( IN, OUT ), I can easily wrap this code in a loop on the table (or just accumulate checks if it seems more adequate to me).

Return to the test after writing the function

, , , ( ). , - , . . ( ).

, , ( ).

: . , , , , , , , , . , . , , , .

, , test-first test-template-first, , . , .

test-first, , . , , ; , , . , , , - , , , .

, , , , . , .

, , , «» . , : , , . test-first test-last, .

, : test-first — , , , , .

— (, ), .

findings

, test-first — , , , , . , , . , - , , , , .

— nickolas_v , ( ) , .

Source: https://habr.com/ru/post/274771/

All Articles