Unit Tests in ABAP. Part Three Every kind of fuss

This article is focused on ABAP developers in SAP ERP systems. It contains many platform-specific points that are of little interest or even controversial for developers using other platforms.

This is the third part of the publication. You can read the beginning here:
Unit Tests in ABAP. Part one. First test
Unit Tests in ABAP. Part two. Rake

We will be measured

It is believed that the main metric of quality tests is the coating. On the development web, you can often find formulations in the “full coverage” style. As a rule, the full coverage is understood as a certain absolute in 100.00%.
')
The percentage of coverage is a dubious figure, just as dubious as the “average hospital temperature”. The percentage of coverage for the project is the average coverage of its parts. That is: Module-1 has a coverage of 80%, Module-2 has a coating of 20%, on average, coverage will be 50%, assuming that the modules are approximately equal in content. Is it true that 80% is four times better than 20%?

The average is different. ABAP UNIT has three different coverage metrics:

on procedures (procedure coverage)
statement statements
by branch (branch coverage)

For example, there is a class, function group, or subroutine pool:

form do_something. c = a. d = b. endform. form do_something_else. if a > b. c = a. d = a. else. c = b. d = b. if d > 1000. d = 1000. endif. endif. endform. form do_nothing. if 1 = 2. c = d = 0. endif. endform.

Nb. A subroutine pool is easier to demonstrate than a group of functions or a class with methods. A subroutine pool is described by a significantly smaller number of letters than a class. Parameters are beyond definitions. As part of this small demonstration, there is no significant difference. And in general: all variables are fictional, any coincidences with the productive code are random .

And suppose we wrote one simple test for each subroutine, function, method. For all subroutines we will use the values [A = 7, B = 77].

 class lcl_test definition for testing duration short risk level harmless. private section. methods: setup. methods: do_something for testing. methods: do_something_else for testing. methods: do_nothing for testing. endclass. class lcl_test implementation. method setup. a = 7. b = 77. endmethod. method do_something. perform do_something. endmethod. method do_something_else. perform do_something_else. endmethod. method do_nothing. perform do_nothing. endmethod. endclass.

NB: Let there be a general initialization for the time being, and let us check the result .

Procedure coverage

This is the easiest case, you can count on the fingers. Coverage for procedures will be 100% = (1 + 1 + 1) / (1 + 1 + 1) * 100.

Statement coverage

And if for the same procedures we count the number of instructions?

Each procedure contains a different number of instructions. And with the given input parameters, not all instructions will be called:

DO_SOMETHING: fulfilled three instructions from three
DO_SOMETHING_ELSE: completed five instructions out of eight
DO_NOTHING: completed two instructions out of three

The instructions are considered simple: the usual instructions, the procedure itself is considered an instruction, the condition is considered an instruction. The ENDIF condition termination construct is not considered an instruction, because it only determines the place of transition, but is not related to any calculations or actions.

If we calculate the metric according to the instructions, then it will be 71% = (3 + 5 + 2) / (3 + 8 + 3) * 100.

Consider the work of a metric on DO_SOMETHING_ELSE. ABAP development tools can color source lines according to metrics:

Visually, quickly, clearly. Just amazing, did not even expect this from ABAP.

From this coloring it becomes obvious that if we took other initial parameters, the percentage of coverage could be different. In the case of [A = 77, B = 7]:

In this case, it becomes obvious that full coverage on this metric can only be achieved using more than one test scenario. For example, with two tests [A = 77, B = 7] and [A = 7, B = 7777], everything will turn green:

Thus, the metric goes 100%. You can calm down for a while.

Branch coverage

This metric works a little harder. It takes all the instructions that can cause branching, and checks them for the fact that each such instruction is executed in both directions.

Let's look at the base of the last example:

The first instruction [IF A> B] on two tests worked two times: once with TRUE [A = 77, B = 7] and once with FALSE [A = 7, B = 7777].

But the second instruction [IF D> 1000] worked only once for TRUE [A = 7, B = 7777].

The function call itself is considered an unconditional unit, plus the first IF gives two of the two, the second IF gives only one of the two. So our metric will be equal to 80% = (1 + 2 + 1) / (1 + 2 + 2) * 100.

And here it already comes out that for one function the two tests are not enough, but you need three. To the previous two, you can also add a script [A = 7, B = 77], so that the second IF works on FALSE.

After adding the third script, the metric for this function is 100%.

And what about DO_NOTHING, you ask? There is no such test that the metric for branches or instructions is 100%. Obviously, the function requires refactoring, without which it will not be possible to reach the full coverage. This function should either be removed, or it should be converted from DO_NOTHING to DO_SOMETHING_COMPLETELY_DIFFERENT.

One hundred percent!

It is a pity you can not write more tests and get more than 100%.

It is clear that the metrics of Procedure coverage is less revealing in details. It is possible to look closely at it only at the early stages, if the code is many and there are almost no tests yet. But which of the two remaining metrics look after? If the first metric just shows how widely you have covered the functionality, the latter show how well you have covered it.

As you noticed, you can get 100% of the instructions, but it will not be 100% of the branches. But not the other way around (or I can’t think of such an example). If you already received 100% of the branches, it means you went to all the back streets and all the instructions worked. But it may seem to someone that the metric on branches gives less exponential weights on average, since it ignores one of the obvious weights, the number of lines of code, that is, the number of instructions.

BTW: Yes, an empty procedure gives 100% performance!

Persuasion is persuasion

For ABAP Unit work it does not matter:

how many test classes do you have at all;
what is the name of the test class;
where it is located;
as his methods are called.

The main thing is that the local class:

was available;
had the nickname “for testing”;
had methods called “for testing”.

But, on the other hand, even the names of variables are also not a random set of letters.

Consequently, we should have some general conditional agreement on each item, which facilitates the general perception of the picture. Like a naming or formatting convention.

What?

Test classes should be exactly as much as needed. At a minimum, each large object (group of functions, program, class) should have one test class, more.

If you have a simple group of several related functions, then one class is enough for it. But if in your group there are six packs of loosely coupled functions, then the question here should rather be “How many groups of functions should there be?”, And this is a topic for a completely different conversation.

After correctly answering this question, you can take the SETUP method as a criterion for divisibility. Such a method in the class should be one, it is automatically called before each test method.

Each script should give a separate test method, the name of the method should be directly derived from the code being tested.

Where?

One of the principles of unit testing: a test class should test only the code in the jurisdiction of which it is located. And although the tests can be anywhere in the source code, it is worthwhile to separate the working functionality from the tests.

Here is a wizard for groups of functions that creates a separate include program using a predefined pattern: for example: LZFI_BTET99 for the function group ZFI_BTE. I do not see anything wrong with this, I must take it as a model and continue in the same vein.

Also in programs of the type REPORT: write tests strictly in one separate include-program, with a name by pattern.

However, I can not forbid anyone to write everything mixed: the code, its test, the code, its test ...

When?

You cannot run the full cycle of unit tests every five minutes. But, at least, before releasing the request, it is necessary to run the test of the involved objects.

Summarize

Just a pack of abstracts to summarize:

You can live with it.
The test code is more productive code.
In many cases, TDD approaches are justified and recommended for use.
A review of the existing productive code without tests causes a significant strain on the brain convolutions.
If you try to cover the already written code with tests, then often you can’t do without refactoring. And refactoring is a slightly different and more difficult task than the test coverage itself. Refactoring can be postponed until you do something else with the code.
If you have a productive code that does not change for years, then it should be covered with tests in the last place.
The code may remain uncovered due to a closed loop: You cannot do the right tests, because you first need to do a serious refactoring, and it is not recommended to do refactoring until there are no tests.
In some cases, test coverage fails. It happens. Humble yourself.
For some metrics, counting the completeness of coverage is much easier to achieve 100% than for others. This does not mean that it should be done as it is easier.
I did not notice the tests in the standard system code, and if they were, then they would be forbidden to perform.

In the bustle, do not forget the main thing: tests are not an end in themselves. Everything must be of benefit.

The real real benefit of the tests will be only in those moments when, over time, the tests will fail, when someone completes this functionality.

Because the ability to properly fall - the best way to avoid injury. If this is true for karatekas and cyclists, then it will also be useful for programmers. It is better to fall down, badly pedaling, than to fall down, while pedaling well. The ability to properly fall more important than the correct equipment.

And right now, you can only extract indirect benefits:

Tests document usage scenarios
Tests help identify places that need attention (refactoring)
Tests help to perform basic checks on sites that are difficult to test manually.

For today, until we meet again.

Source: https://habr.com/ru/post/274181/

All Articles