
The question of how to add our own diagnostic rule to our static analyzer PVS-Studio is often asked to us. And we always answer that it is very simple to do this: “We just need to write us a letter, and we will add this rule to the analyzer”. Such an interface for adding new rules is convenient for the user. This is the best and most convenient interface. To do the most similar work is not as easy as it seems. In this article, I will show the underwater part of the iceberg, which is hidden behind the concept of “added this simple rule”.
Let's go straight to the cleanliness. There is no mechanism in PVS-Studio for the user to write his own rules himself. This is primarily an architectural constraint. And this can upset some users. However, in fact, such users do not understand what they want. Because in life the process of adding a new rule is quite complicated.
')
Here, for example, is the rule
V536 : Be advised that the “utilized octal constant” is used. It is designed to detect errors such as this in Miranda IM:
static const struct _tag_cpltbl
{
unsigned cp;
const char * mimecp;
} cptbl [] =
{
{037, "IBM037"}, // IBM EBCDIC US-Canada
{437, "IBM437"}, // OEM United States
{500, "IBM500"}, // IBM EBCDIC International
{708, "ASMO-708"}, // Arabic (ASMO 708)
...
}
It says 037 instead of 37 just for alignment, although it turned out to be decimal 31, since 037, according to the rules of C ++, is octal. The error is pretty simple to diagnose, right? In fact, we are looking for a constant, see if it starts from 0 and issue a diagnostic message. This diagnostic rule can be implemented in an hour. And the user of the static analyzer (a professional in programming, but a beginner in the development of such tools) is ready in theory to implement such a rule.
However, he will not be able to do it. Rather, it will work, but he will not be satisfied with the result due to the large number of false positives.
We (the developers of PVS-Studio) do the following. Before implementing a rule, we first write unit tests on it, where we mark in a special way the lines to which it should work, and also leave lines in the unit test, where the rule should not issue anything. Then the first version of the diagnostic code is written. Having achieved the passage of unit tests written for this new rule, we run already complete unit tests. After all, the new rule could break something. Full unit tests pass in a few tens of minutes. Of course, from the first time “breaking nothing” is not always possible, so the iterations are repeated until the unit passes the complete tests.
After unit tests, the next stage is launched on real projects. There are about 70 projects in our test base (for 2011). These are well-known and not very open source projects on which we test our analyzer. We have our own samopisnaya launcher, which opens projects in Visual Studio, starts the analyzer, saves the log, compares it with the standard and shows the differences (which messages were lost, which were added, and which changed). One run cycle of all projects for Visual Studio 2005 lasts 4 hours on a machine with four cores and eight gigabytes of memory (analysis is parallelized). According to the results of the analysis, we see the results of the introduction of the new rule. Usually, we begin to look at the differences without waiting for the analysis to be completed, because sometimes it is immediately obvious that “everything is broken”. However, even if everything works and nothing is broken, the tests often stop earlier. The reason for this is false positives. If we see that the diagnostic rule has too many positives, then exceptions appear in the rule.
Let us return to the search for incorrect octal numbers - rule V536. There are a number of exceptions. Here are the situations when the diagnosis is not issued:
- Constant value is less than 8.
- The number is declared via #define.
- In one block there are more than two octal numbers (the rule worked more than two times), except for 0, of type char.
- This number is used as an argument in functions: _open, mknod, open, wxMkdir, wxFileName :: Mkdir, wxFileName :: AppendDir, chmod.
- This is a nested block of numbers. We check only the top one. Example: {{090}, {091}, {092}, {093}}.
- This number <= 0777 and is part of the statement, where there is a word: file, File.
- The number is a function argument, one of the parameters of which contains the string "% o".
It is not always possible to predict exceptions before launch on real projects. Therefore, the iteration "exception - test run - analysis of results" can be repeated many times.
Then, when all exceptions are implemented, and the number of false positives is adequate, the test results (the safe log files of detected errors) are declared reference. Then run the same tests for other versions of Visual Studio. Now (2011) we have three versions of Visual Studio 2005/2008/2010 supported, tests are performed in them, respectively, 4 / 4.5 / 5 hours, which gives a total of 14-15 hours of time. According to the results of a run in a different version of Visual Studio, it may be found that a different number of diagnostic messages are issued for the same code depending on the version of Visual Studio. This is usually due to differences in the header files of different SDKs. We try to eliminate the differences so that the test run in all versions of Visual Studio gives the same results.
Why is it so important to drive away all these tiresome tests? Is it only because of false positives? No, it's not just the false positives. The point is also that implementing the rules it is very difficult to take into account all possible types of structures and write the rule so that it works correctly and moreover does not lead to the fall of the analyzer. That's what this huge amount of tests is for!
What strange constructions are we talking about? I will give a couple of examples.
Do you know that you can declare a variable, for example, something like this:
int const unsigned static a22 = 0;
Do you know that __int64 can be a variable name?
int __identifier (__ int64);
Do you know that after the switch braces are not required?
switch (0)
if (x) foo ();
Even if you yourself do not use such constructs in your code, they are in used libraries.
Finally, after passing all the tests you can take up the documentation. For each diagnostic rule we write a certificate in Russian and English. Translation to English also takes time. Now (when the code is debugged, the tests are passed, the documentation is written) the rule will fall into the next release of PVS-Studio.
So, the development cycle of the new diagnostic rule can be briefly presented as follows:
- The idea of ​​a new rule. Or we come up with, or users suggest, or pry somewhere.
- Formulation rules.
- The implementation of the rule.
- All sorts of testing.
- Analysis of the results. Possible options:
- Revision of the rule formulation, introduction of exceptions, go to item 2.
- The implementation of the rule is considered established, you can write and translate documentation.
The total time for the implementation of one rule is several days. As a rule, it is from 2 to 5 days. Naturally, with experience, we can already predict what exceptions to the rule will be needed and immediately implement them. And the number of test runs can be reduced. However, in any case, the key point is that only on the basis of test runs, exceptions can be formulated and most of the false positives can be overcome.
Now back to the question of how to add the rule to the user himself. As you can see, it is difficult to do it yourself for two simple reasons. The user does not have the qualifications sufficient to come up with successful exceptions and there is no good test base to run the rule on the tests.
Then the question arises: “Why, in some well-known and respected static analyzers, are there ways to set our own user rules?” Maybe we just can't do this? Partly yes. But only the mechanism for setting your own rules serves one simple purpose - to create “comparison matrices with competitors”:

And what, the reader will ask, will there never be a user rule in PVS-Studio? Someday we will have the strength to realize this and not to lose in the comparison matrices :-). In the meantime, we will all who want to create a new rule give a link to this text and say: “We have the best interface for adding custom rules - just write us a letter!”