Hello! My name is Artyom, and most of my working time I write complex auto-tests on Selenium and Cucumber / Calabash. Honestly, quite often I find myself faced with a difficult choice: to write a test that checks a specific implementation of functionality (because it is easier) or a test that tests functionality (because it is more correct, but much more difficult)? Recently, I came across a nice article that implementation tests are “tautological” tests. And, having read it, I have been rewriting some tests in a different way for almost a week. I hope she pushes you to thoughts too.
Everyone knows that tests are essential for quickly creating high-quality software. But, like everything else in our lives, if used improperly, they can do more harm than good. Consider the following simple function and test. In this case, the author wants to protect the tests from external dependencies, so the stubs are used.
import hashlib from typing import List from unittest.mock import patch def get_key(key: str, values: List[str]) -> str: md5_hash = hashlib.md5(key) for value in values: md5_hash.update(value) return f'{key}:{md5_hash.hexdigest()}' @patch('hashlib.md5') def test_hash_values(mock_md5): mock_md5.return_value.hexdigest.return_value = 'world' assert get_key('hello', ['world']) == 'hello:world' mock_md5.assert_called_once_with('hello') mock_md5.return_value.update.assert_called_once_with('world') mock_md5.return_value.hexdigest.assert_called()
Looks great! Four statements have been fully tested to ensure that the code works as expected. Tests even pass!
$ python3.6 -m pytest test_simple.py ========= test session starts ========= itemstest_simple.py . ======= 1 passed in 0.03 seconds ======
Of course, the problem is that the code is wrong. md5 only accepts bytes
, not str
( this post explains how bytes
and str
changed in Python 3). The test script does not play a big role; only string formatting was tested here, which gives us a false sense of security: it seems to us that the code was written correctly, and we even proved it with the help of test scripts!
Fortunately, mypy catches these problems:
$ mypy test_simple.py test_simple.py:6: error: Argument 1 to “md5” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]” test_simple.py:8: error: Argument 1 to “update” of “_Hash” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]”
Remarkably, we fixed our code to first transcode strings to bytes:
def get_key(key: str, values: List[str]) -> str: md5_hash = hashlib.md5(key.encode()) for value in values: md5_hash.update(value.encode()) return f'{key}:{md5_hash.hexdigest()}'
Now the code works, but the problems remain. Suppose someone walked through our code and simplified it to just a few lines:
def get_key(key: str, values: List[str]) -> str: hash_value = hashlib.md5(f"{key}{''.join(values)}".encode()).hexdigest() return f'{key}:{hash_value}'
Functionally obtained is identical to the source code. For the same input data, it will always return the same result. But even in this case, the test passes with an error:
E AssertionError: Expected call: md5(b'hello') E Actual call: md5(b'helloworld')
Obviously, there is some problem with this simple test. Here at the same time there is a first kind error (the test fails even if the code is correct) and a second kind error (the test does not fall when the code is incorrect). In an ideal world, tests will fall if (and only if) the code contains an error. And in an even more perfect world, when passing tests, you can be completely sure of the correctness of the code. And although both ideals are unattainable, it is worth striving for them.
The tests described above, I call "tautological." They confirm the correctness of the code, ensuring that it is executed as written, which, of course, assumes that it is written correctly.
I believe that the tautological tests are an undoubted negative for your code. For several reasons:
In other words, tautological tests often miss real problems, stimulating the bad habit of blindly correcting tests, and at the same time the benefits of them do not pay for their efforts to support them.
Let's rewrite the test to check the output:
def test_hash_values(mock_md5): expected_value = 'hello:fc5e038d38a57032085441e7fe7010b0' assert get_key('hello', ['world']) == expected_value
Now the details of get_key
are not important for the test, it will fail only if get_key
returns an incorrect value. I can change the internals of get_key
as I get_key
without updating the tests (until I change public behavior). In this case, the test is short and easy to understand.
Although this is a contrived example, in real code it is easy to find places where, for the sake of increasing code coverage, it is assumed that the output of external services meets the implementation expectations.
The test code cannot be edited without matching with the implementation. In this case, there is a great chance that you got a tautological test. In Testing on the Toilet: Don't Overuse Mocks you will find a very familiar example. You can recreate the implementation itself based on this test:
public void testCreditCardIsCharged() { paymentProcessor = new PaymentProcessor(mockCreditCardServer); when(mockCreditCardServer.isServerAvailable()).thenReturn(true); when(mockCreditCardServer.beginTransaction()).thenReturn(mockTransactionManager); when(mockTransactionManager.getTransaction()).thenReturn(transaction); when(mockCreditCardServer.pay(transaction, creditCard, 500)).thenReturn(mockPayment); when(mockPayment.isOverMaxBalance()).thenReturn(false); paymentProcessor.processPayment(creditCard, Money.dollars(500)); verify(mockCreditCardServer).pay(transaction, creditCard, 500); }
Avoid stubs in memory objects. For stubs to use dependencies that are entirely in memory, we need very good reasons. Perhaps the underlying function is non-deterministic or it takes too long to execute. The use of real objects increases the value of tests by testing a larger number of interactions in the test scenario. But even in this case, there should be tests to ensure that the code correctly uses these dependencies (such as a test that checks that the output is in the expected range). Below is an example in which we check that our code works if randint
returns a specific value and that we correctly call randint
.
import random from unittest.mock import patch def get_thing(): return random.randint(0, 10) @patch('random.randint') def test_random_mock(mock_randint): mock_randint.return_value = 3 assert get_thing() == 3 def test_random_real(): assert 0 <= get_thing() < 10
It is better to leave a line of code uncovered than to create the illusion that it is well tested.
Also pay attention to the tautological tests, conducting a revision of someone else's code. Ask yourself what the test actually tests, and not just cover any lines of code.
Remember, tautological tests are bad because they are not good.
Source: https://habr.com/ru/post/336194/
All Articles