We randomly checked out the Clang project. I think the result will be curious to a number of developers. Details under the cut.
PVS-Studio now uses the external preprocessor Microsoft Visual C ++, which is a significant drawback. The Visual C ++ preprocessor is extremely slow and has errors that we cannot fix. Yes, don't be surprised that the preprocessor is working out of hands badly, but this does not prevent you from quickly and correctly compiling files in Visual Studio. I do not know exactly how cl.exe works, but from circumstantial evidence I can assume that there are two completely different preprocessing algorithms used for compilation and for creating * .i files. Apparently, this is more convenient in terms of the compiler architecture.
Recently, we have been actively seeking an alternative solution for preprocessing files. And apparently, our choice will stop at the preprocessor implemented in Clang. Preliminary measurements show that it works several times faster than cl.exe, which is very nice and useful.
Clang is a new compiler for C-like languages (C, C ++, Objective-C, Objective-C ++, in the process support for C ++ 11). The development is sponsored by Apple. One of the strengths of the Clang compiler is the large number of static code analysis rules. In practice, Clang is already used by Apple as a static analysis tool.
')
Clang was originally designed to maximize the preservation of information during the compilation process [1]. This feature allows Clang to create deployed context-oriented error messages that are understandable for both programmers and development environments. The modular design of the compiler allows it to be used as part of the development environment for syntax highlighting and refactoring.
So, for good, PVS-Studio, perhaps, should be based on Clang, and not on VivaCore [2]. But now it's too late and everything is not so simple. In this case, we would be too dependent on the capabilities of a third-party library. Yes, and support Microsoft Specific in the project Clang not in a hurry.
However, we were distracted. It is most interesting to check one code analyzer with another code analyzer. We did it, since all the same have already begun to explore the draft Clang.
Unfortunately, a full comparison is impossible. It would be great to check Clang with PVS-Studio and vice versa. And then count the number of errors found per one thousand lines. The trouble is that we can check Clang, but he doesn’t. In Clang, MSVC support is experimental. Plus, the case is complicated by the fact that PVS-Studio already uses the features of C ++ 11. A simple compile attempt results in errors like 'These language extensions are not yet supported'.
We'll have to limit ourselves to the fact that we managed to find PVS-Studio in the Clang code. Naturally, what will be described in the article is not all the errors that were found. Clang is about 50 megabytes of source code. I will notify the developers about the defects found and, if they are interested, they will be able to check their project themselves and examine the entire list of potential errors. However, they
heard about us and discussed some of our diagnostics.
Let's see what we found interesting. Almost all errors found are Copy-Paste errors.
Copy-Paste N1 Error
static SDValue PerformSELECTCombine(...) { ... SDValue LHS = N->getOperand(1); SDValue RHS = N->getOperand(2); ... if (!UnsafeFPMath && !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(RHS)) ... if (!UnsafeFPMath && !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(LHS)) ... }
PVS-Studio Diagnostics: V501 There are identical sub-expressions '! DAG.isKnownNeverZero (LHS)' operator. LLVMX86CodeGen x86isellowering.cpp 11635
In the beginning, the code simultaneously works with LHS and RHS, and then only with LHS. The reason for this, apparently, was a slip or a copied fragment of the string. As I understand it, in the second case the RHS variable must also be present.
Copy-Paste N2 Error
MapTy PerPtrTopDown; MapTy PerPtrBottomUp; void clearBottomUpPointers() { PerPtrTopDown.clear(); } void clearTopDownPointers() { PerPtrTopDown.clear(); }
PVS-Studio Diagnostics: V524 It’s odd that the body of the clearTopDownPointers function is fully equivalent to the body of the clearBottomUpPointers function (ObjCARC.cpp, line 1318) LLVMScalarOpts objcarc.cpp 1322
Classic Copy-Paste. The function has been copied. Her name was changed, but not the body itself. The correct option is:
void clearBottomUpPointers() { PerPtrBottomUp.clear(); }
Copy-Paste N3 Error
static Value *SimplifyICmpInst(...) { ... case Instruction::Shl: { bool NUW = LBO->hasNoUnsignedWrap() && LBO->hasNoUnsignedWrap(); bool NSW = LBO->hasNoSignedWrap() && RBO->hasNoSignedWrap(); ... }
PVS-Studio diagnostics: V501 There are identical sub-expressions 'LBO-> hasNoUnsignedWrap ()' operator. LLVMAnalysis instructionsimplify.cpp 1891
Apparently, they wanted to write like this:
bool NUW = LBO->hasNoUnsignedWrap() && RBO->hasNoUnsignedWrap();
Copy-Paste N4 Error
Sema::DeduceTemplateArguments(...) { ... if ((P->isPointerType() && A->isPointerType()) || (P->isMemberPointerType() && P->isMemberPointerType())) ... }
Diagnostics of PVS-Studio. V501 There are identical sub-expressions "P-> isMemberPointerType ()". clangSema sematemplatededuction.cpp 3240
This is just a case, unlike the fifth example. It is clear that they wanted to write like this:
(P-> isMemberPointerType () && A-> isMemberPointerType ())
Error Copy-Paste N5
static bool CollectBSwapParts(...) { ...
PVS-Studio Diagnostics: V523 The 'then' statement is equivalent to the 'else' statement. LLVMInstCombine instcombineandorxor.cpp 1387
And this is an example of a difficult situation. I'm not even sure the end of the error here or not. The comment does not help me either. Here the creators should analyze the code. But still, I think that here is an example of an error with Copy-Paste.
I think about Copy-Paste is still enough, because there are a number of other types of errors.
Here, for example, the classic error in switch
void llvm::EmitAnyX86InstComments(...) { ... case X86::VPERMILPSri: DecodeVPERMILPSMask(4, MI->getOperand(2).getImm(), ShuffleMask); Src1Name = getRegName(MI->getOperand(0).getReg()); case X86::VPERMILPSYri: DecodeVPERMILPSMask(8, MI->getOperand(2).getImm(), ShuffleMask); Src1Name = getRegName(MI->getOperand(0).getReg()); break; ... }
PVS-Studio Diagnostics: V519 The 'Src1Name' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 211, 215. LLVMX86AsmPrinter x86instcomments.cpp 215
In a large code, such errors are inevitable. The switch statement is very dangerous. One may well know how it works, but still forget this damned 'break'.
There are a number of, if not mistakes, then obviously stupid or suspicious actions.
Stupid action N1, which may be a mistake
AsmToken AsmLexer::LexLineComment() {
PVS-Studio Diagnostics: V501 Operator: CurChar! = '\ N' && CurChar! = '\ N' LLVMMCParser asmlexer.cpp 149
Most likely, the second check CurChar! = '\ N' is superfluous here. But perhaps this is a mistake and should be written:
while (CurChar != '\n' && CurChar != '\r' && CurChar != EOF)
Stupid action of N2, which is definitely not a mistake
std::string ASTContext::getObjCEncodingForBlock(const BlockExpr *Expr) const { ... ParmOffset = PtrSize;
PVS-Studio diagnostics: V519 The 'ParmOffset' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3953, 3956. clangAST astcontext.cpp 3956
N3 stupid action, which I find it difficult to describe
static unsigned getTypeOfMaskedICmp(...) { ... result |= (icmp_eq ? (FoldMskICmp_Mask_AllZeroes | FoldMskICmp_Mask_AllZeroes | FoldMskICmp_AMask_Mixed | FoldMskICmp_BMask_Mixed) : (FoldMskICmp_Mask_NotAllZeroes | FoldMskICmp_Mask_NotAllZeroes | FoldMskICmp_AMask_NotMixed | FoldMskICmp_BMask_NotMixed)); ... }
Diagnostics of PVS-Studio:
V501 There are identical sub-expressions 'FoldMskICmp_Mask_AllZeroes' operator. LLVMInstCombine instcombineandorxor.cpp 505
V501 There are identical sub-expressions 'FoldMskICmp_Mask_NotAllZeroes' to the left and to the right of the '|' operator. LLVMInstCombine instcombineandorxor.cpp 509
I do not know, these are simple duplicates or a different condition must be written. Difficult to assume that there should actually be.
There is a code that is just potentially dangerous.
This code will work as long as the two enum have a similar structure.
enum LegalizeAction { Legal,
Diagnostics of PVS-Studio:
V556 The values of different types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 268
V556 The values of different types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 270
Having TypeLegal is consonant with Legal, and the name TypeExpandInteger is consonant with Expand. This was the cause of a typo. The code only works because you are lucky and the values of these names are the same.
Conclusion
Is it terrible when there are errors in the compiler? :)
PS
It seems I hurried to praise Clang. Just came across a situation where, when preprocessing, it corrupts the code. We have the following fragment in atlcore.h:
ATLASSUME(p != NULL);
The Clang preprocessor turns it into:
do { ((void)0); #pragma warning(push) #pragma warning(disable : 4548) do {__noop(!!(p != 0));} while((0,0) #pragma warning(pop) ); } while(0);
He placed the 'if' operator after the comment and it turned out that “if (* p == '\ 0')” is now also a comment. As a result, we have an incorrect code. Eh, there is no happiness in the life of programmers.
Additional links.
- Wikipedia. Clang. http://www.viva64.com/go.php?url=713
- Library VivaCore. http://www.viva64.com/ru/vivacore-library/