📜 ⬆️ ⬇️

Tautological tests


Hello! My name is Artyom, and most of my working time I write complex auto-tests on Selenium and Cucumber / Calabash. Honestly, quite often I find myself faced with a difficult choice: to write a test that checks a specific implementation of functionality (because it is easier) or a test that tests functionality (because it is more correct, but much more difficult)? Recently, I came across a nice article that implementation tests are “tautological” tests. And, having read it, I have been rewriting some tests in a different way for almost a week. I hope she pushes you to thoughts too.


Everyone knows that tests are essential for quickly creating high-quality software. But, like everything else in our lives, if used improperly, they can do more harm than good. Consider the following simple function and test. In this case, the author wants to protect the tests from external dependencies, so the stubs are used.


import hashlib from typing import List from unittest.mock import patch def get_key(key: str, values: List[str]) -> str: md5_hash = hashlib.md5(key) for value in values: md5_hash.update(value) return f'{key}:{md5_hash.hexdigest()}' @patch('hashlib.md5') def test_hash_values(mock_md5): mock_md5.return_value.hexdigest.return_value = 'world' assert get_key('hello', ['world']) == 'hello:world' mock_md5.assert_called_once_with('hello') mock_md5.return_value.update.assert_called_once_with('world') mock_md5.return_value.hexdigest.assert_called() 

Looks great! Four statements have been fully tested to ensure that the code works as expected. Tests even pass!


 $ python3.6 -m pytest test_simple.py ========= test session starts ========= itemstest_simple.py . ======= 1 passed in 0.03 seconds ====== 

Of course, the problem is that the code is wrong. md5 only accepts bytes , not str ( this post explains how bytes and str changed in Python 3). The test script does not play a big role; only string formatting was tested here, which gives us a false sense of security: it seems to us that the code was written correctly, and we even proved it with the help of test scripts!


Fortunately, mypy catches these problems:


 $ mypy test_simple.py test_simple.py:6: error: Argument 1 to “md5” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]” test_simple.py:8: error: Argument 1 toupdateof “_Hash” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]” 

Remarkably, we fixed our code to first transcode strings to bytes:


 def get_key(key: str, values: List[str]) -> str: md5_hash = hashlib.md5(key.encode()) for value in values: md5_hash.update(value.encode()) return f'{key}:{md5_hash.hexdigest()}' 

Now the code works, but the problems remain. Suppose someone walked through our code and simplified it to just a few lines:


 def get_key(key: str, values: List[str]) -> str: hash_value = hashlib.md5(f"{key}{''.join(values)}".encode()).hexdigest() return f'{key}:{hash_value}' 

Functionally obtained is identical to the source code. For the same input data, it will always return the same result. But even in this case, the test passes with an error:


 E AssertionError: Expected call: md5(b'hello') E Actual call: md5(b'helloworld') 

Obviously, there is some problem with this simple test. Here at the same time there is a first kind error (the test fails even if the code is correct) and a second kind error (the test does not fall when the code is incorrect). In an ideal world, tests will fall if (and only if) the code contains an error. And in an even more perfect world, when passing tests, you can be completely sure of the correctness of the code. And although both ideals are unattainable, it is worth striving for them.


The tests described above, I call "tautological." They confirm the correctness of the code, ensuring that it is executed as written, which, of course, assumes that it is written correctly.



I believe that the tautological tests are an undoubted negative for your code. For several reasons:


  1. Tautological tests give engineers a false sense that their code is correct. They can look at the high coverage of the code and be happy for their projects. Other people using the same code base will confidently push the changes while the tests pass, although these tests do not actually test anything.
  2. Tautological tests actually "freeze" the implementation, and do not check that the code behaves as intended. If you change any aspects of the implementation, you must reflect this by changing the tests, rather than changing the tests when the expected output changes. This encourages engineers to correct tests in case of failures during their run, and not to find out why tests fail. If this happens, then the tests become a burden, their original meaning is lost as a tool to prevent bugs from getting into production.
  3. Static analysis tools are able to find blatant errors in your code, such as typos, that would be caught by tautological tests anyway. Static analysis tools have improved significantly over the past five years, especially in dynamic languages. For example, Mypy in Python, Hack in PHP or TypeScript in JavaScript. All of them are often better suited for typos, while being more valuable to engineers, because they make the code more understandable and easier to navigate.

In other words, tautological tests often miss real problems, stimulating the bad habit of blindly correcting tests, and at the same time the benefits of them do not pay for their efforts to support them.


Let's rewrite the test to check the output:


 def test_hash_values(mock_md5): expected_value = 'hello:fc5e038d38a57032085441e7fe7010b0' assert get_key('hello', ['world']) == expected_value 

Now the details of get_key are not important for the test, it will fail only if get_key returns an incorrect value. I can change the internals of get_key as I get_key without updating the tests (until I change public behavior). In this case, the test is short and easy to understand.


Although this is a contrived example, in real code it is easy to find places where, for the sake of increasing code coverage, it is assumed that the output of external services meets the implementation expectations.


How to identify tautological tests


  1. Tests that fail when updated are much more frequently tested code. We always pay the price for code coverage. If this price exceeds the benefit received from the tests, then it is likely that the tests are too closely related to the implementation. Related problem: small changes in the code under test require updating a much larger number of tests.
  2. The test code cannot be edited without matching with the implementation. In this case, there is a great chance that you got a tautological test. In Testing on the Toilet: Don't Overuse Mocks you will find a very familiar example. You can recreate the implementation itself based on this test:


     public void testCreditCardIsCharged() { paymentProcessor = new PaymentProcessor(mockCreditCardServer); when(mockCreditCardServer.isServerAvailable()).thenReturn(true); when(mockCreditCardServer.beginTransaction()).thenReturn(mockTransactionManager); when(mockTransactionManager.getTransaction()).thenReturn(transaction); when(mockCreditCardServer.pay(transaction, creditCard, 500)).thenReturn(mockPayment); when(mockPayment.isOverMaxBalance()).thenReturn(false); paymentProcessor.processPayment(creditCard, Money.dollars(500)); verify(mockCreditCardServer).pay(transaction, creditCard, 500); } 


How to fix tautological tests


  1. Separate I / O from logic. It is because of input / output engineers most often turn to plugs. Yes, input / output is extremely important; without it, we could only scroll through the processor cycles and heat the air. But it is better to transfer I / O to the periphery of your code, and not to mix it with logic. The Python Sans-I / O Working Group developed excellent documentation on this issue, and Corey Benfield gave an excellent account of it in his Building Building Libraries The Right Way presentation at PyCon 2016.
  2. Avoid stubs in memory objects. For stubs to use dependencies that are entirely in memory, we need very good reasons. Perhaps the underlying function is non-deterministic or it takes too long to execute. The use of real objects increases the value of tests by testing a larger number of interactions in the test scenario. But even in this case, there should be tests to ensure that the code correctly uses these dependencies (such as a test that checks that the output is in the expected range). Below is an example in which we check that our code works if randint returns a specific value and that we correctly call randint .


     import random from unittest.mock import patch def get_thing(): return random.randint(0, 10) @patch('random.randint') def test_random_mock(mock_randint): mock_randint.return_value = 3 assert get_thing() == 3 def test_random_real(): assert 0 <= get_thing() < 10 

  3. Use auxiliary data. If a stub dependency is used as an external service, then create a set of fake data or use a stub server to provide supporting data. Centralizing the fake implementation allows you to carefully emulate the behavior of a real implementation and minimize the amount of test changes as the implementation changes.
  4. Do not be afraid to leave part of the code uncovered! If you choose between good code testing and no tests, the answer is obvious: test well. But when choosing between a tautological test and the absence of a test, everything is not so obvious. I hope I convinced you that the tautological tests are evil. If you leave some of the code uncovered, it will become for other developers a kind of indicator of the current state of affairs - they will be able to exercise caution when modifying this part of the code. Or, preferably, use the aforementioned techniques to write suitable tests.


It is better to leave a line of code uncovered than to create the illusion that it is well tested.


Also pay attention to the tautological tests, conducting a revision of someone else's code. Ask yourself what the test actually tests, and not just cover any lines of code.


Remember, tautological tests are bad because they are not good.


What to read on the topic



')

Source: https://habr.com/ru/post/336194/


All Articles