📜 ⬆️ ⬇️

Check PVS-Studio with Clang

Checking PVS-Studio with Clang
Yes Yes. You heard right. This time the article is “the opposite”. We didn’t check a project, but checked our analyzer with another tool. In fact, we have done this before. For example, they checked PVS-Studio with Cppcheck, using the analyzer built into Visual Studio, and looked at Intel C ++ warnings. But earlier there was no reason to write an article. There was nothing interesting. But Clang could be interested in its diagnostic messages.

We checked Clang twice with PVS-Studio [ 1 , 2 ] and each time we found something interesting. But we could not check PVS-Studio at all. Clang developers have long reported that they are good at building projects in Windows developed using Visual C ++. But in practice, somehow it does not work. Or we have no luck.

And the other day we suddenly realized that we could easily test our analyzer with the help of Clang. It was just necessary to approach the task from the other side. Every night the command-line version of PVS-Studio under Linux is built with the help of GCC. And the GCC compiler is very easily replaced by Clang. Accordingly, we can easily try to check PVS-Studio. And really. On the same day, as a bright idea came to mind to one of the employees, we had a report on the verification of PVS-Studio. Now I sit and talk about the contents of the report and my impressions.

Impression of html reports


I, of course, already dealt with Clang. However, it is difficult to judge the quality of analysis on third-party projects. I often cannot understand whether there is an error or not. It is especially scary when Clang shows that you need to view the path of 37 points in the source code.
')
The PVS-Studio code, on the contrary, is familiar to me, and I was finally able to fully work with the report issued by Clang. Unfortunately, this confirmed my earlier impressions that the often shown way of reaching a detected error is redundant and confuses the programmer. I understand that it is extremely difficult to produce key points of program execution and build such a path. We, for example, in PVS-Studio are generally afraid to take on such a voluminous task. However, since Clang implements the display of this path, it is clearly necessary to go further in improving this mechanism.

Otherwise, such points only knock down, litter the conclusion and complicate understanding:



The figure shows the "point N4". Somewhere below is an error. I understand that an error will occur only if the condition is not met. This is what Clang reports. But why display this information? And so it is clear that if the condition is true, then there will be an exit from the function and there will be no error. Extra meaningless information. There is a lot of such redundant information. It is obviously possible and worth improving this mechanism.

However, I want to give credit to the Clang developers. Often, showing such a path really helps to understand the cause of the error, especially when more than one function is involved in it. And, unequivocally, the Clang developers have shown the way of reaching the error much better than in the Visual Studio 2013 static analyzer. Half of the function consisting of 500 lines is often tinted there. And what gives this coloring is completely incomprehensible.

Criticality of errors found


PVS-Studio check is a very good example of how ungrateful it is to show the benefits of static analysis on a working, well-tested project. In fact, I can "otmazatsya" from all the errors, saying that:

"Having otmazavshis", I will pretend that I do not make serious mistakes and with a proud look I can say that Clang is suitable only for beginners in programming.

Of course, so I will not say! The fact that critical errors were not found does not at all indicate that Clang is weak in analysis. The absence of such defects is the result of a lot of testing work using various methods:

After such a layered defense, it is difficult to expect that Clang will find 20 dereferencing null pointers and 10 divisions by 0. However, think about it. Even in a thoroughly tested project, Clang found several errors. This means that with regular use you can avoid a lot of trouble. It is better to correct the error when Clang finds it, than to get the * .i file from the user where PVS-Studio crashes.

Naturally, we made conclusions for ourselves. Now my colleague is in the process of setting up a Clang launch on the server and sending logs to the mail if the analyzer finds something.

About false positives


In total, the Clang analyzer issued 45 warnings. I don’t want to talk about the number of false positives. Better to say that you should definitely fix 12 places.

The fact is that “false positives” are relative matters. Formally, the analyzer is right - the code is bad and suspicious. But this does not mean that a defect was found. Let me explain with examples.

In the beginning, consider the real false positive:
#define CreateBitMask(bitNum) ((v_uint64)(1) << bitNum) unsigned GetBitCountForRepresntValueLoopMethod( v_int64 value, unsigned maxBitsCount) { if (value == 0) return 0; if (value < 0) return maxBitsCount; v_uint64 uvalue = value; unsigned n = 0; int bit; for (bit = maxBitsCount - 1; bit >= 0; --bit) { if ((uvalue & CreateBitMask(bit)) != 0) // Clang: Within the expansion of the macro 'CreateBitMask': // The result of the '<<' expression is undefined { n = bit + 1; break; } .... } 

As I understand it, the analyzer reports that a shift operation may lead to undefined behavior. Apparently, Clang got confused in logic or could not correctly calculate the possible range of values ​​for the maxBitsCount variable. I carefully studied how the function GetBitCountForRepresntValueLoopMethod () is called and could not find a situation when the value in the 'maxBitsCount' variable is too large. I understand shifts [ 3 ]. So I am sure that there is no error.

Self-confidence is good, but not enough. Therefore, I wrote this assert ():
 .... for (bit = maxBitsCount - 1; bit >= 0; --bit) { VivaAssert(bit >= 0 && bit < 64); if ((uvalue & CreateBitMask(bit)) != 0) .... 

And on all tests this assert () did not work. So this is really an example of a false positive of the Clang analyzer.

The pleasant consequence of adding assert () was that Clang stopped issuing the message. It focuses on assert () macros. They tell the analyzer possible ranges of variable values.

There are few such false positives. Much more such messages here:
 static bool G807_IsException1(const Ptree *p) { .... if (kind == ntArrayExpr) { p = First(p); kind = p->What(); // Clang: Value stored to 'kind' is never read .... 

The assignment "kind = p-> What ();" is not used. Previously used, and after the changes, this line has become redundant. The analyzer is right. The string is superfluous, and it should be deleted at least so that it does not confuse the person who will accompany this code.

Another example:
 template<> template<> void object::test<11>() { .... //  nullWalker     . VivaCore::VivaWalker *nullWalker = 0; left.m_simpleType = ST_INT; left.SetCountOfUsedBits(32); left.m_creationHistory = TYPE_FROM_VALUE; right.m_simpleType = ST_INT; right.SetCountOfUsedBits(11); right.m_creationHistory = TYPE_FROM_EXPRESSION; result &= ApplyRuleN1(*nullWalker, left, right, false); // Clang: Forming reference to null pointer .... } 

In the unit test, the null pointer is dereferenced. Yes, you can not do so and ugly. But really want. The point is that it is very difficult to prepare an instance of the VivaWalker class, and in this case the reference to the object is not used at all.

Both examples show code that works. However, I do not call this a false positive. These are the shortcomings that need to be addressed. However, I would not write these warnings in the column “errors found” either. That's why I say that a false positive is a relative concept.

Bugs found


Finally we got to the section where I will show interesting code snippets that Clang found inside PVS-Studio.

Found errors are not critical for the program. I'm not trying to justify myself. But it turned out really so. After editing all the warnings, the regression tests did not reveal any differences in the operation of PVS-Studio.

However, we are talking about the real mistakes and it's great that Clang could find them. I hope now with its regular launches, he will be able to find more serious blunders in the fresh code of PVS-Studio.

Using two uninitialized variables


The code was big and complicated. There is no point in citing it in the article. I made a synthetic code that reflects the essence of the error.
 int A, B; bool getA, getB; Get(A, getA, B, getB); int TmpA = A; // Clang: Assigned value is garbage or undefined int TmpB = B; // Clang: Assigned value is garbage or undefined if (getA) Use(TmpA); if (getB) Use(TmpB); 

The Get () function can initialize the variables A and B. It initialized them or not, it marks in the variables getA, getB.

Regardless of whether the variables A and B are initialized, their values ​​are copied to TmpA, TmpB. This is where uninitialized variables are used.

Why am I saying that this is a non-critical error? In practice, copying an uninitialized variable of the 'int' type does not cause any trouble. Formally, as I understand it, an Undefined behavior occurs. In practice, the garbage is simply copied. Further variables with garbage are not used.

The code was rewritten as follows:
 if (getA) { int TmpA = A; Use(TmpA); } if (getB) { int TmpB = B; Use(TmpB); } 

Uninitialized pointers


Let's look at the function call GetPtreePos (). It is passed links to uninitialized pointers.
 SourceLocation Parser::GetLocation(const Ptree* ptree) { const char *begin, *end; GetPtreePos(ptree, begin, end); return GetSourceLocation(*this, begin); } 

It is not right. The GetPtreePos () function assumes that the pointers will be initialized to nullptr. Here is how it works:
 void GetPtreePos(const Ptree *p, const char *&begin, const char *&end) { while (p != nullptr) { if (p->IsLeaf()) { const char *pos = p->GetLeafPosition(); if (....) { if (begin == nullptr) { // Clang: The left operand of '==' is a garbage value begin = pos; } else { begin = min(begin, pos); } end = max(end, pos); } return; } GetPtreePos(p->Car(), begin, end); p = p->Cdr(); } } 

It only saves you from shame that the GetLocation () function is called when a specific code parsing error occurs in the unit test subsystem. Apparently, this has never happened.

A good example of how static analysis complements TDD [4].

Scary explicit type conversions


There are three similar functions with scary and incorrect type conversions. Here is one of them:
 bool Environment::LookupType( CPointerDuplacateGuard &envGuard, const char* name, size_t len, Bind*& t, const Environment **ppRetEnv, bool includeFunctions) const { VivaAssert(m_isValidEnvironment); //todo: Environment *eTmp = const_cast<Environment *>(this); Environment **ppRetEnvTmp = const_cast<Environment **>(ppRetEnv); bool r = eTmp->LookupType(envGuard, name, len, t, ppRetEnvTmp, includeFunctions); ppRetEnv = const_cast<const Environment **>(ppRetEnvTmp); // Clang: Value stored to 'ppRetEnv' is never read return r; } 

Sodom and Gomorrah. We tried to remove constancy. And then return the resulting value. But, in fact, in the line “ppRetEnv = const_cast ....” the local variable ppRetEnv just changes.

Let me explain where this ugliness came from and how it affects the work of the program.

PVS-Studio analyzer is based on the OpenC ++ library. It almost did not use the keyword 'const'. At any time, you could change anything and anywhere, using pointers to non-constant objects. PVS-Studio inherited this flaw.

They fought with him. But not yet until the end. You enter in one place const, it is necessary to enter in the second place, then in the third and so on. Then it turns out that in certain situations something needs to be changed with the help of the pointer and it is necessary to divide the function into parts or even more extensive refactoring.

The last heroic attempt to make everywhere const one of the idealistic employees took a week to complete and ended in partial failure. It became clear that we needed big changes in the code and editing of some data storage structures. The carrying of light into the kingdom of darkness was stopped. There are several functions of stubs, as discussed, the purpose of which is to make the code being compiled.

What does this error affect? Oddly enough, it seems no matter what. All unit tests and regression tests did not reveal any changes in the behavior of PVS-Studio after the corrections. Apparently, the return value in “ppRetEnv” is not very necessary at work.

Using a potentially uninitialized variable


 v_uint64 v; // Clang: 'v' declared without an initial value verify(GetEscape(p, len - 3, v, notation, &p)); retValue <<= 8; retValue |= v; // Clang: Assigned value is garbage or undefined 

The function GetEscape () may not work correctly and then the variable 'v' will remain uninitialized. The result of the work GetEscape () for some reason checks the macro verify (). How it happened is already unknown.

The error goes unnoticed for the following reason. The function GetEscape () does not initialize the variable only if the PVS-Studio analyzer will work with incorrect text of the program. In the correct text of the program there are always correct escape sequences and the variable is always initialized.

I'm surprised how it worked


 Ptree *varDecl = bind->GetDecl(); if (varDecl != nullptr) { if (varDecl->m_wiseType.IsIntegerVirtualValue()) varRanges.push_back(....); else if (varDecl->m_wiseType.IsPointerVirtualValue()) varRanges.push_back(....); else varRanges.push_back(nullptr); } rangeTypes.push_back(varDecl->m_wiseType.m_simpleType); // Clang: Dereference of null pointer 

The varDecl pointer may be nullptr. However, the last line is always executed. And a null pointer dereference can occur: varDecl-> m_wiseType.m_simpleType.

Why there has never been a fall here is a mystery to me. The only assumption is that we never get here unless the object stores a pointer to a declaration of a variable (declarator of variable). But it is still impossible to rely on it.

Found a very serious mistake, which, if not now, then, someday would definitely manifest itself.

Surprisingly, the fall was not seen here either.


Another amazing place. Apparently, a combination of factors that can lead to null pointer dereferencing is extremely unlikely. At least the fall has not been seen since the writing of this function. But since that moment, it has already been a year and a half. Miracle.
 void ApplyRuleG_657(VivaWalker &walker, const BindFunctionName *bind, const IntegerVirtualValueArray *pReturnIntegerVirtualValues, const PointerVirtualValueArray *pReturnPointerVirtualValues, const Ptree *body, const Ptree *bodySrc, const Environment *env) { if (body == nullptr || bodySrc == nullptr) { VivaAssert(false); return; } if (bind == nullptr) return; if (pReturnIntegerVirtualValues == nullptr && pReturnPointerVirtualValues == nullptr) return; .... size_t integerValueCount = pReturnIntegerVirtualValues->size(); // Clang: Called C++ object pointer is null 

The pReturnIntegerVirtualValues ​​pointer may well be equal to nullptr.

At first glance, it seems that there is an error in the condition and that the operator should be used: "
 if (pReturnIntegerVirtualValues == nullptr && pReturnPointerVirtualValues == nullptr) 

But it is not. The condition is correct. You just need to check that the pointer is not null before dereferencing the pointer. If it is zero, the value 0 must be assigned to the integerValueCount variable. The correct variant is:
 size_t integerValueCount = pReturnIntegerVirtualValues != nullptr ? pReturnIntegerVirtualValues->size() : 0; 

Amazing. So many tests. Runs on 90 open source projects. Plus for the year we checked a lot of other projects. And still, the bug lives in the code. And after all, he would have shown himself to our important potential client.

Glory to static analyzers! Glory Clang!

Other


The analyzer revealed some more errors that should be corrected. It is difficult to describe them, but I don’t want to make synthetic examples. Plus there are a couple of warnings that are absolutely true, but useless. There had to turn off the analysis.

For example, Clang was worried about uninitialized variables when using the RunPVSBatchFileMode () function. But the fact is that the batch run is simply not implemented for Linux, and there is a stub there. And we are not planning to implement it yet.

findings


Use static analyzers in your work.

I think the core of PVS-Studio is extremely tested. However, the Clang static analyzer found 12 real errors. Other places are not errors, but they were code that smells, and I fixed all such places.

The errors found could manifest themselves at the most inopportune moment. Plus, I think this analyzer would help find a lot of errors that were detected by tests. A run of the main regression tests takes about 2 hours. If you find something before, that's fine.

That turned out such an article with advertising Clang. But he deserved that.

Just do not think that other analyzers are of little use. For example, I personally like the Cppcheck analyzer. It is very easy to use and has very clear diagnostics. But it so happened that he did not find a bunch of errors in PVS-Studio, so I cannot write a similar article about him.

And, of course, I recommend using the PVS-Studio analyzer in your work. It is extremely convenient for using Visual C ++ [ 5 ]. Especially do not forget about the automatic analysis that starts after the successful compilation of the changed files.

This article is in English.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Checking PVS-Studio with Clang .

Additional links:


  1. Andrey Karpov. PVS-Studio vs Clang .
  2. Andrey Karpov. Static analysis should be applied regularly .
  3. Andrey Karpov. Without knowing the ford, do not go into the water. Part Three (about shift operations) .
  4. Andrey Karpov. How static analysis complements TDD .
  5. Andrey Karpov. PVS-Studio for Visual C ++ .


Read the article and have a question?
Often our articles are asked the same questions. We collected the answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please review the list.

Source: https://habr.com/ru/post/232753/


All Articles