Destroy the myths about static code analysis

Communicating with people on the forums, I noticed several persistent misconceptions regarding the static analysis methodology. I want to dispel the following myths:
We destroy the myths about static analysis

We destroy the myths about static analysis

A static analyzer is a single use product.
Professional developers do not make stupid mistakes.
Dynamic analysis is better than static.
Programmers want to add their rules to a static analyzer.

The first myth - a static analyzer is a single use product.

Here is what this statement looks like when discussing on the forum (collective image):

Having a trial / cracked version, you can get rid of all your projects for free, find some old errors and, in fact, calm down on this for a while.

Everyone is happy. People checked out. The developers of the analyzer did not know that they were deceived and robbed.
')
In this case, the programmer deceived himself, not the creators of the tool. He received only the appearance of the benefits of the work done, but not the real benefits. So far I have not managed to convey this idea, but I will continue to try to do it. There is no proc from single launches of a static analyzer.

Analogy:

We set the compiler warning level / W0. And we are developing a project. We swear, we rule stupid mistakes and typos, we test more and longer. Then occasionally turn on / W3 and fight with warnings, and then again go back to / W0. At the same time, what the compiler could tell at the / W3 level, we fearlessly and long searched in the debugger and spent 10-100 times more time on it. In addition, note that now the programmer will not like the / W3 result. After all, he corrected almost all the mistakes by testing and debugging. The compiler at the / W3 level now produces mostly false positives.

Now back to the static analysis. The picture is completely identical. A rare start of the analyzer gives a lot of false positives. There are few real errors, as they are found by other methods.

Just like the / W3 key, using static analysis is most useful in the case of regular use. By the way, static analysis is a kind of extension of warnings issued by the compiler. Many diagnostics, which were once implemented in old analyzers, are gradually being transferred to the compiler. Of course, analyzers will always be ahead of compilers in diagnostic terms. They are designed for this purpose. The compiler has more other concerns and, in addition, it imposes more stringent performance requirements.

Some in the heat of discussion answer:

The idea is true for beginner students. For professionals, this is not so important. If I put / W0 I will not write worse. It is necessary to improve the programming style, and not to increase the number of crutch tools.

Totally agree with everything written above. But let's play and redo the text like this:

The idea is true for novice drivers. For professionals, this is not so important. If I do not wear a seat belt, I will not drive worse. It is necessary to improve the driving style, and not to increase the number of insuring components.

And again you can not argue. However, any adequately minded driver understands that it is still useful to buckle up. Static analysis is also useful. After all, even the most experienced programmer is not insured against errors and typos. And the examples given in this article are a good confirmation of this. Of course, all professional programmers are sure that they do not make such stupid mistakes, but about this further.

The second myth - professional developers do not make stupid mistakes.

The second myth: "Professional developers do not make stupid mistakes, which basically catch static code analyzers."

Here is what this statement looks like when discussing on the forum (collective image):

I, a professional developer, haven’t had problems with memory damage, lifetime of objects, and so on for N years. Static analysis is a tool for McDonald's, and here (on a professional forum) there are geeks. Now my main problems are difficult-to-test algorithms and integration with other developers using implicit contracts on object states.

It sounds as if the problems of typos and inadvertent errors, this is the lot of students. Professional developers have not done them for a long time, and major troubles are caused by such complex errors as synchronization problems or complex data processing algorithms.

This is not true. Any programmers make stupid mistakes. I know you did not hear me. I repeat once again heretical thoughts. All programmers make stupid mistakes. It doesn't matter what they are professionals. People tend to make mistakes. And most often these errors are simple.

Programmers are very unfriendly perceive my statement about the error. In their opinion, they did not make such mistakes for a very long time. I think this is an interesting aspect of the psyche that eliminates the memory of uninteresting programming moments.

Let us distract a little and remember why various horoscopes die so hard. The first reason is very vague formulations that are easy for a person to adjust. But we are interested in the second component. People do not remember the cases when the prediction did not come true. But they very well remember and retell those cases when their life situation coincided with the situation described in the horoscope. The result is that, speaking and recalling the horoscope, we find N evidence that the horoscopes are working and do not remember about N * 10 cases when the horoscope did not work.

Something similar happens to the programmer when he is looking for errors. He well remembers complex and interesting mistakes. Can debate about them with colleagues or write a note to the blog. When he notices that instead of the variable 'AB' he wrote 'BA', then he will simply correct the mistake, and this fact will immediately disappear from his memory. Freud drew attention to the following feature of memory: it is human nature to remember positive statements about yourself and forget negative ones. If a person fights with a complex error in an algorithmic problem, then when he corrects it, he considers himself a hero. It is worth remembering and even tell others. When he finds a stupid bug, there is no reason or desire to remember about it.

What evidence do I have? Although most typos and blunders are corrected, some of them still go unnoticed. And a lot of examples of such errors can be found in this article. The article clearly shows that errors were made not by beginners, but by qualified programmers.

Conclusion. Programmers spend on fixing typos much more time than they think. Static analysis tools can significantly save the efforts of developers, identifying some of these errors before the testing phase.

The third myth - dynamic analysis is better than static

The third myth: “Dynamic testing with tools such as valgrind for C / C ++ is much more productive than static code analysis.”

The statement is rather strange. Dynamic and static analysis are simply two different methodologies that complement each other. Programmers seem to understand this. But I hear again and again that dynamic analysis is better than static.

I will list the strengths of static code analysis.

Diagnostics of all branches of the program
Dynamic analysis in practice can not cover all the branches of the program. After these words, valgrind fans say they need to do the right tests. Theoretically they are right. But anyone who has tried to do this, understands the complexity and volume of work. In practice, even good tests cover no more than 80% of the program code.

This is especially well seen in code sections that handle non-standard / emergency situations. If you take the old project, most of the errors will be revealed by a static analyzer in these places. The reason is that even if the project is old, these sites are practically not tested. I will give a very short example to show what I mean (the FCE Ultra project):

  fp = fopen (name, "wb");
 int x = 0;
 if (! fp)
   int x = 1;

The 'x' flag will not be equal to one if the file has not been opened. It is because of such errors, when something goes wrong in the programs, instead of adequate messages about errors, they fall or give meaningless messages.

Scalability
To regularly check large projects with dynamic methods, it is necessary to create a special infrastructure. Need special tests. We need a parallel launch of several instances of the application with different inputs.

Static analysis is scaled several times easier. As a rule, it is enough to provide a program with a static analysis machine with a large number of cores.

Higher level analysis
The dynamic analyzer has the advantage that it knows which function is being called with which arguments. As a result, he can check the correctness of the call. Static analysis in most cases can not find out and check the values of the arguments. This is a minus. But static analysis performs a higher-level analysis than dynamic. And this allows him to look for such things that are correct from the point of view of dynamic analysis. A simple example (ReactOS project):

  void Mapdesc :: identify (REAL dest [MAXCOORDS] [MAXCOORDS])
 {
   memset (dest, 0, sizeof (dest));
   for (int i = 0; i! = hcoords; i ++)
     dest [i] [i] = 1.0;
 }

From the point of view of dynamic analysis, everything is fine here. But static analysis will score an alarm , since it is very suspicious that so many bytes are reset to the array, how many are pointers.

Or another example from the Clang project:

  MapTy PerPtrTopDown;
 MapTy PerPtrBottomUp;
 void clearBottomUpPointers () {
   PerPtrTopDown.clear ();
 }
 void clearTopDownPointers () {
   PerPtrTopDown.clear ();
 }

What can a dynamic analyzer suspect here? Nothing. A static analyzer may be suspicious. Here the error is that inside clearBottomUpPointers () should be: "PerPtrBottomUp.clear ();".

The fourth myth - programmers want to add their own rules to the static analyzer

Myth Four: “The static analyzer should be able to add custom rules. Programmers want to add their own rules. ”

No, they do not want. In fact, they want to solve some problems of finding specific language constructs. And this is not the same as creating diagnostic rules.

I always answered that the implementation of my own rules is not what programmers want. And I did not see any other alternative, except for the implementation of diagnostics by the developers of the analyzer at the request of programmers ( an article on this topic ). Recently, I talked closely with Dmitry Petunin. He is the head of compiler testing and the development of software verification tools for Intel. He broadened my understanding of this topic and voiced an idea that I thought about, but never formulated in the final version.

Dmitry confirmed my belief that programmers will not write diagnostic rules. The reason is very simple - it is very difficult. In a number of static analysis tools, it is possible to expand the set of rules. But this is done more for show or for the convenience of the creators of the tool. A very deep immersion in the subject is required in order to develop new diagnostics. If an enthusiast without experience takes on this, there will be little practical benefit from his rules.

This is where my understanding of the question ended. Dmitry, having a great experience, expanded it. In short, the situation looks like this.

Programmers really want to look for some patterns / errors in their code. They really need it. For example, a person needs to find all explicit casts from type int to float. This task cannot be solved using tools such as grep. Indeed, in the construction of the form “float (P-> FOO ())” it is not known what type the function FOO () will return. At this point, the programmer comes to the conclusion that he can implement the search for such constructions, adding his test in the static analyzer.

Here lies the key point. A person does not need to create their own rules of analysis. He needs to solve a private problem. What he wants is a very small task in terms of static analysis mechanisms. This is the same as using a car to light cigarettes from a cigarette lighter.

That is why both I and Dmitry do not support the idea of providing the user with an API for working with the analyzer. This is an extremely difficult task in terms of development. And at the same time from it a person is unlikely to use more than 1%. Irrational. It is easier and cheaper for a developer to implement the wishes of users than to create a complex API for expansion modules or create a special language for describing rules.

The reader will note: "then open only 1% of the functionality in the API and everyone will be happy." Yes everything is correct. But look how the accent has shifted. From developing our own rules, we came to the conclusion that a tool similar to grep is enough, but it has some additional information about the program code.

There is no such tool yet. If you want to solve some problem, you can write to me, and we will try to implement it in the PVS-Studio analyzer. For example, recently we just implemented a few wishes for finding explicit type conversions : V2003 , V2004 , V2005 . Implementing such wishes is much easier for us than creating and maintaining an open interface. It's easier for the users themselves.

By the way, perhaps such a tool will eventually appear within the framework of Intel C ++. Dmitry Petunin said that they discussed the possibility of creating a grep-like tool with knowledge of the structure of the code and the types of variables. But it was discussed in the abstract. I do not know whether it is planned to actually create such or not.

Source: https://habr.com/ru/post/131551/

All Articles

Destroy the myths about static code analysis

The first myth - a static analyzer is a single use product.

The second myth - professional developers do not make stupid mistakes.

The third myth - dynamic analysis is better than static

The fourth myth - programmers want to add their own rules to the static analyzer

More articles: