PVS-Studio vs Clang

We randomly checked out the Clang project. I think the result will be curious to a number of developers. Details under the cut.

PVS-Studio now uses the external preprocessor Microsoft Visual C ++, which is a significant drawback. The Visual C ++ preprocessor is extremely slow and has errors that we cannot fix. Yes, don't be surprised that the preprocessor is working out of hands badly, but this does not prevent you from quickly and correctly compiling files in Visual Studio. I do not know exactly how cl.exe works, but from circumstantial evidence I can assume that there are two completely different preprocessing algorithms used for compilation and for creating * .i files. Apparently, this is more convenient in terms of the compiler architecture.

Recently, we have been actively seeking an alternative solution for preprocessing files. And apparently, our choice will stop at the preprocessor implemented in Clang. Preliminary measurements show that it works several times faster than cl.exe, which is very nice and useful.

Clang is a new compiler for C-like languages (C, C ++, Objective-C, Objective-C ++, in the process support for C ++ 11). The development is sponsored by Apple. One of the strengths of the Clang compiler is the large number of static code analysis rules. In practice, Clang is already used by Apple as a static analysis tool.
')
Clang was originally designed to maximize the preservation of information during the compilation process [1]. This feature allows Clang to create deployed context-oriented error messages that are understandable for both programmers and development environments. The modular design of the compiler allows it to be used as part of the development environment for syntax highlighting and refactoring.

So, for good, PVS-Studio, perhaps, should be based on Clang, and not on VivaCore [2]. But now it's too late and everything is not so simple. In this case, we would be too dependent on the capabilities of a third-party library. Yes, and support Microsoft Specific in the project Clang not in a hurry.

However, we were distracted. It is most interesting to check one code analyzer with another code analyzer. We did it, since all the same have already begun to explore the draft Clang.

Unfortunately, a full comparison is impossible. It would be great to check Clang with PVS-Studio and vice versa. And then count the number of errors found per one thousand lines. The trouble is that we can check Clang, but he doesn’t. In Clang, MSVC support is experimental. Plus, the case is complicated by the fact that PVS-Studio already uses the features of C ++ 11. A simple compile attempt results in errors like 'These language extensions are not yet supported'.

We'll have to limit ourselves to the fact that we managed to find PVS-Studio in the Clang code. Naturally, what will be described in the article is not all the errors that were found. Clang is about 50 megabytes of source code. I will notify the developers about the defects found and, if they are interested, they will be able to check their project themselves and examine the entire list of potential errors. However, they heard about us and discussed some of our diagnostics.

Let's see what we found interesting. Almost all errors found are Copy-Paste errors.

Copy-Paste N1 Error

static SDValue PerformSELECTCombine(...) { ... SDValue LHS = N->getOperand(1); SDValue RHS = N->getOperand(2); ... if (!UnsafeFPMath && !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(RHS)) ... if (!UnsafeFPMath && !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(LHS)) ... }

PVS-Studio Diagnostics: V501 There are identical sub-expressions '! DAG.isKnownNeverZero (LHS)' operator. LLVMX86CodeGen x86isellowering.cpp 11635

In the beginning, the code simultaneously works with LHS and RHS, and then only with LHS. The reason for this, apparently, was a slip or a copied fragment of the string. As I understand it, in the second case the RHS variable must also be present.

Copy-Paste N2 Error

 MapTy PerPtrTopDown; MapTy PerPtrBottomUp; void clearBottomUpPointers() { PerPtrTopDown.clear(); } void clearTopDownPointers() { PerPtrTopDown.clear(); }

PVS-Studio Diagnostics: V524 It’s odd that the body of the clearTopDownPointers function is fully equivalent to the body of the clearBottomUpPointers function (ObjCARC.cpp, line 1318) LLVMScalarOpts objcarc.cpp 1322

Classic Copy-Paste. The function has been copied. Her name was changed, but not the body itself. The correct option is:

 void clearBottomUpPointers() { PerPtrBottomUp.clear(); }

Copy-Paste N3 Error

 static Value *SimplifyICmpInst(...) { ... case Instruction::Shl: { bool NUW = LBO->hasNoUnsignedWrap() && LBO->hasNoUnsignedWrap(); bool NSW = LBO->hasNoSignedWrap() && RBO->hasNoSignedWrap(); ... }

PVS-Studio diagnostics: V501 There are identical sub-expressions 'LBO-> hasNoUnsignedWrap ()' operator. LLVMAnalysis instructionsimplify.cpp 1891

Apparently, they wanted to write like this:

 bool NUW = LBO->hasNoUnsignedWrap() && RBO->hasNoUnsignedWrap();

Copy-Paste N4 Error

 Sema::DeduceTemplateArguments(...) { ... if ((P->isPointerType() && A->isPointerType()) || (P->isMemberPointerType() && P->isMemberPointerType())) ... }

Diagnostics of PVS-Studio. V501 There are identical sub-expressions "P-> isMemberPointerType ()". clangSema sematemplatededuction.cpp 3240

This is just a case, unlike the fifth example. It is clear that they wanted to write like this:

(P-> isMemberPointerType () && A-> isMemberPointerType ())

Error Copy-Paste N5

 static bool CollectBSwapParts(...) { ... // 2) The input and ultimate destinations must line up: if byte 3 of an i32 // is demanded, it needs to go into byte 0 of the result. This means that the // byte needs to be shifted until it lands in the right byte bucket. The // shift amount depends on the position: if the byte is coming from the high // part of the value (eg byte 3) then it must be shifted right. If from the // low part, it must be shifted left. unsigned DestByteNo = InputByteNo + OverallLeftShift; if (InputByteNo < ByteValues.size()/2) { if (ByteValues.size()-1-DestByteNo != InputByteNo) return true; } else { if (ByteValues.size()-1-DestByteNo != InputByteNo) return true; } ... }

PVS-Studio Diagnostics: V523 The 'then' statement is equivalent to the 'else' statement. LLVMInstCombine instcombineandorxor.cpp 1387

And this is an example of a difficult situation. I'm not even sure the end of the error here or not. The comment does not help me either. Here the creators should analyze the code. But still, I think that here is an example of an error with Copy-Paste.

I think about Copy-Paste is still enough, because there are a number of other types of errors.

Here, for example, the classic error in switch

 void llvm::EmitAnyX86InstComments(...) { ... case X86::VPERMILPSri: DecodeVPERMILPSMask(4, MI->getOperand(2).getImm(), ShuffleMask); Src1Name = getRegName(MI->getOperand(0).getReg()); case X86::VPERMILPSYri: DecodeVPERMILPSMask(8, MI->getOperand(2).getImm(), ShuffleMask); Src1Name = getRegName(MI->getOperand(0).getReg()); break; ... }

PVS-Studio Diagnostics: V519 The 'Src1Name' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 211, 215. LLVMX86AsmPrinter x86instcomments.cpp 215

In a large code, such errors are inevitable. The switch statement is very dangerous. One may well know how it works, but still forget this damned 'break'.

There are a number of, if not mistakes, then obviously stupid or suspicious actions.

Stupid action N1, which may be a mistake

 AsmToken AsmLexer::LexLineComment() { // FIXME: This is broken if we happen to a comment at the end of a file, which // was .included, and which doesn't end with a newline. int CurChar = getNextChar(); while (CurChar != '\n' && CurChar != '\n' && CurChar != EOF) CurChar = getNextChar(); ... }

PVS-Studio Diagnostics: V501 Operator: CurChar! = '\ N' && CurChar! = '\ N' LLVMMCParser asmlexer.cpp 149

Most likely, the second check CurChar! = '\ N' is superfluous here. But perhaps this is a mistake and should be written:

 while (CurChar != '\n' && CurChar != '\r' && CurChar != EOF)

Stupid action of N2, which is definitely not a mistake

 std::string ASTContext::getObjCEncodingForBlock(const BlockExpr *Expr) const { ... ParmOffset = PtrSize; // Argument types. ParmOffset = PtrSize; ... }

PVS-Studio diagnostics: V519 The 'ParmOffset' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3953, 3956. clangAST astcontext.cpp 3956

N3 stupid action, which I find it difficult to describe

 static unsigned getTypeOfMaskedICmp(...) { ... result |= (icmp_eq ? (FoldMskICmp_Mask_AllZeroes | FoldMskICmp_Mask_AllZeroes | FoldMskICmp_AMask_Mixed | FoldMskICmp_BMask_Mixed) : (FoldMskICmp_Mask_NotAllZeroes | FoldMskICmp_Mask_NotAllZeroes | FoldMskICmp_AMask_NotMixed | FoldMskICmp_BMask_NotMixed)); ... }

Diagnostics of PVS-Studio:

V501 There are identical sub-expressions 'FoldMskICmp_Mask_AllZeroes' operator. LLVMInstCombine instcombineandorxor.cpp 505

V501 There are identical sub-expressions 'FoldMskICmp_Mask_NotAllZeroes' to the left and to the right of the '|' operator. LLVMInstCombine instcombineandorxor.cpp 509

I do not know, these are simple duplicates or a different condition must be written. Difficult to assume that there should actually be.

There is a code that is just potentially dangerous.

This code will work as long as the two enum have a similar structure.

 enum LegalizeAction { Legal, // The target natively supports this operation. Promote, // This operation should be executed in a larger type. Expand, // Try to expand this to other ops, otherwise use a libcall. Custom // Use the LowerOperation hook to implement custom lowering. }; enum LegalizeTypeAction { TypeLegal, // The target natively supports this type. TypePromoteInteger, // Replace this integer with a larger one. TypeExpandInteger, // Split this integer into two of half the size. ... }; LegalizeTypeAction getTypeAction(LLVMContext &Context, EVT VT) const; EVT getTypeToExpandTo(LLVMContext &Context, EVT VT) const { ... switch (getTypeAction(Context, VT)) { case Legal: return VT; case Expand: ... }

Diagnostics of PVS-Studio:

V556 The values of different types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 268

V556 The values of different types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 270

Having TypeLegal is consonant with Legal, and the name TypeExpandInteger is consonant with Expand. This was the cause of a typo. The code only works because you are lucky and the values of these names are the same.

Conclusion

Is it terrible when there are errors in the compiler? :)

PS

It seems I hurried to praise Clang. Just came across a situation where, when preprocessing, it corrupts the code. We have the following fragment in atlcore.h:

 ATLASSUME(p != NULL); // Too expensive to check separately here if (*p == '\0') // ::CharNextA won't increment if we're at a \0 already return const_cast<_CharType*>(p+1);

The Clang preprocessor turns it into:

 do { ((void)0); #pragma warning(push) #pragma warning(disable : 4548) do {__noop(!!(p != 0));} while((0,0) #pragma warning(pop) ); } while(0); // Too expensive to check separately here if (*p == '\0') // ::CharNextA won't increment if we're at a \0 already return const_cast<_CharType*>(p+1);

He placed the 'if' operator after the comment and it turned out that “if (* p == '\ 0')” is now also a comment. As a result, we have an incorrect code. Eh, there is no happiness in the life of programmers.

Additional links.

Wikipedia. Clang. http://www.viva64.com/go.php?url=713
Library VivaCore. http://www.viva64.com/ru/vivacore-library/

Source: https://habr.com/ru/post/125626/

All Articles