Difficulties in comparing code analyzers or do not forget about usability

The desire of users to compare different code analyzers between themselves is understandable and natural. However, to realize this desire is not as easy as it may seem at first glance. The fact is that it is unclear what specific factors to compare among themselves.

If we reject absolutely absurd ideas like “compare the number of diagnosed errors” or “compare the number of messages that the tool issued”, then even a reasonable parameter “signal-to-noise ratio” does not seem to be an ideal criterion for evaluating the code analyzer.
')
Do you doubt that comparing these parameters is pointless? Let's give some examples.

What parameters to compare is simply meaningless

Consider such a simple at first glance characteristic, as the number of diagnostic tests. It seems that the more of them, the better. But for an end user using a specific set of operating systems and compilers, the total number of rules does not mean anything. The diagnostic rules that are relevant to systems, libraries and compilers, which he does not use, do not give him anything. They even interfere with it, cluttering the system settings and documentation, complicate the use and implementation of the tool.

The following analogy is relevant here. A man enters the store to buy a heater. He is interested in the department of home appliances and if there is a big choice in this department, then this is good. But other departments are not interested in him. There is nothing wrong if you can buy an inflatable boat, cell phone and chair in this store. But the presence of the department of inflatable boats does not improve the range of heaters.

Take, for example, the Klocwork tool, which supports a large variety of systems, including exotic ones. In one of these systems, there is a compiler that “swallows” the following code:

  inline int x;

The Klocwork analyzer has a diagnostic message that allows you to detect this anomaly in the code: “The 'inline' keyword. It turns out that it seems good that there is such a diagnosis. But, say, a developer using the Microsoft Visual C ++ compiler or another adequate compiler does not have any benefit from this diagnostics. Visual C ++ simply does not compile such code: "error C2433: 'x': 'inline' not permitted on data declarations".

Another example. Some compilers poorly support the bool type. Therefore, Klocwork can warn of a situation where a member of the class has a bool type: "PORTING.STRUCT.BOOL: This is a struct / class has a bool member."

They wrote bool in the classroom, here’s the horror ... It is clear that an extremely small percentage of developers can benefit from such a diagnostic message.

There are a lot of similar examples. And it turns out that the total number of diagnostic rules has nothing to do with how many errors the analyzer will detect in a particular project. An analyzer that implements 100 diagnostics and is oriented for Windows applications can find much more errors in a project compiled by Microsoft Visual Studio than a cross-platform analyzer that implements 1000 diagnostics.

So, we have obtained that it is impossible to compare the utility of the analyzer in such a parameter as the number of diagnostic tests.

You can say: “Let us then compare the number of checks relevant to a particular system. For example, we will select all the rules that allow you to search for errors in Windows programs. ” But this approach does not work for two reasons:

Firstly, it often happens that in some analyzer the test is implemented by one diagnostic rule, and in the other - by several rules. And if you compare the number of diagnostic rules, it seems that one of the analyzers is better, although they are the same for detectable errors.

Secondly, the implementation of various diagnostics can be of different quality. For example, almost all analyzers have a search for “magic numbers”. But, for example, in one analyzer only magic numbers can be detected that are dangerous from the point of view of transferring code to 64-bit systems (4, 8, 32, etc.), then in the other - just all magic numbers (1, 2). , 3, etc.). And just in the comparison table in the column “search for magic numbers” to put both there and there a “plus sign” will not be enough.

They also like to give such a characteristic as the speed of the tool, or the number of lines of code processed per minute. But after all, it does not make practical sense. There is no connection between the speed of the code analyzer and the speed of project analysis by a person! First, often the launch of code analysis is performed automatically during nightly builds. And it is important to just “have time” to check up in the morning. And, secondly, they often forget when comparing about such a parameter as usability. However, let's deal with this issue in more detail.

Ease of use of the tool is very important for proper comparison.

The fact is that a very important role in the actual use of code analyzers is how convenient the tool is to use ...

Recently, we tested the eMule project with two code analyzers, evaluating the convenience of this operation. One of them was a static analyzer built into some editions of Visual Studio. The other is our PVS-Studio . And we immediately saw several problems in working with the code analyzer built into Visual Studio. Moreover, these problems do not relate to the very quality of analysis and speed of work.

The first problem is the impossibility to save the list of messages from the analyzer for further work with it. For example, when I checked with the built-in eMule analyzer, I received two thousand messages. Thoughtfully process them all at once is impossible, so you have to return to them within a few days. However, the inability to save the results of the analysis leads to the fact that you have to re-check the project every time, which is very tiring. In PVS-Studio there is an opportunity to save the analysis results in order to return to work with them later.

The second problem is related to how the processing of duplicate analyzer messages is implemented. It's about diagnosing problems in the header files (.h-files). Let the analyzer detect a problem in an .h file that is included in ten .cpp files. Analyzing each of these ten .cpp files, the analyzer built into Visual Studio gives the same message about the problem in the .h file ten times! Specific example. When checking eMule message

  c: \ users \ evg \ documents \ emuleplus \ dialogmintraybtn.hpp (450): 
 warning C6054: String 'szwThemeColor' might not be zero-terminated:
 Lines: 434, 437, 438, 443, 445, 448, 450

issued more than ten times. Because of this, the analysis results are cluttered up, and the same messages have to be viewed almost constantly. It should be noted that from the very beginning in PVS-Studio, duplicate messages were not displayed, but filtered.

The third problem is the issue of messages on problems in the system include files (from folders of the type C: \ Program Files (x86) \ Microsoft Visual Studio 10.0 \ VC \ include). The analyzer built into Visual Studio does not hesitate to stigmatize the system header files, although there is not much practical sense in this. Again an example. When eMule was checked, the same message about system files was repeatedly encountered:

  1> c: \ program files (x86) \ microsoft
 sdks \ windows \ v7.0a \ include \ ws2tcpip.h (729): 
 warning C6386: Buffer overrun: accessing 'argument 1', 
 the writable size is '1 * 4' bytes, 
 but '4294967272' bytes might be written: 
 Lines: 703, 704, 705, 707, 713, 714, 715, 720, 
 721, 722, 724, 727, 728, 729

Anyway, no one will edit the system files. So why on such files "swear"? PVS-Studio never swore system files.

This also includes the inability to specify the tool so that it does not check some files by mask. For example, all files are "* _generated.cpp" or "c: \ libs \". In PVS-Studio, you can specify files to exclude.

The fourth problem affects the actual work with the list of messages from the code analyzer. In any code analyzer, of course, you can disable any diagnostic messages by code. Only here it can be done with a different level of convenience. More precisely, the question is whether it is necessary to restart the analysis to hide extra messages by code or not. In the code analyzer from Visual Studio, you need to rewrite the codes of the disconnected messages in the project settings, then restart the analysis. At the same time, of course, it is unlikely that all the “extra” diagnostics will immediately be indicated, so the restart will have to be repeated several times. In PVS-Studio, you can easily hide and show messages by code without restarting and are much more convenient.

The fifth problem is filtering messages not only by code, but also by text. For example, it is useful to hide all messages containing "printf". There is no such possibility in the built-in analyzer in Visual Studio, there is in PVS-Studio.

Finally, the sixth problem is how convenient it is for the instrument to indicate that this message is a false positive. The mechanism with #pragma warning disable, used in Visual Studio, allows you to hide the message only when you restart the analysis. Unlike the mechanism in PVS-Studio, where messages can be marked as “False Alarm” and hidden without restarting the analysis.

All six of these problems, though not affecting the quality of the actual code analysis, but are very important. After all, the ease of use of the tool is the integral indicator on which it depends: will it really come to the assessment of the quality of the analysis or not.

Total, we get the following picture. The eMule project is checked by a static analyzer built into Visual Studio several times faster than PVS-Studio does. But it took 3 days to work with the Visual Studio analyzer (actually less, but had to switch to other tasks in order to rest). And it took only 4 hours to work with PVS-Studio.

Note. Regarding the number of errors found, both analyzers showed approximately equal results, and found the same errors in this project.

Conclusion

Comparing static analyzers with each other is a very complex and complex task. And to answer the question which tool is better AT ALL - it is impossible. You can only talk about which tool is best for a particular project and user.

Source: https://habr.com/ru/post/116317/

All Articles

Difficulties in comparing code analyzers or do not forget about usability

What parameters to compare is simply meaningless

Ease of use of the tool is very important for proper comparison.

Conclusion

More articles: