Under the hood: build and open source flint

Static code analysis programs are an unusual class of program verifiers, and for some time I was not convinced of the need to use them when developing for Facebook. I do not tolerate stylistic rules on my neck, and false error warnings can spoil the whole task. However, they have some good ones: if the verifier mechanically looks for problems that are traditionally not controlled by the compiler, then this should almost always improve the quality of the code as soon as the problem is fixed.
Flint , a Facebook program for static analysis, produces analysis errors that automatically appear on our phabricator review system next to each proposed code change, notifying the programmer that something might go wrong. Flint has become an important part of the work we are doing on Facebook, and I am very happy to open its source code so that everyone can check what we are doing and try it for ourselves.

But why not use existing code analyzers?

Writing a code analyzer for C ++ is not a task for the faint of heart, because C ++ is very difficult to parse. But nevertheless, at present there is a whole bunch of analyzers with many features, some even with open source. So the question why we decided to write our own, and not to use the existing ones, is quite logical.
When we started this project, all the programs we tested were too slow and did not support most of the C ++ 11 innovations that we already used in development. Clang, which today will be a logical starting point for analyzing C ++ code, offered too little support at the time. And even now, it cannot compile a part of our C ++ code.
And most importantly, the rules of code analyzers are very dependent on the nature of the organizations that use them. We imagined what we were looking for and found out that whichever analyzer we chose, we would have to refine it for a long time. So we decided to develop our own code analyzer.

Tokens, comments and language D, oh!

Based on the principle of “the simplest solution that will work”, flint is token-oriented, which is contrasted with the construction of a code parsing tree. The analyzer loads the input file, converts it into an array of tokens and analyzes this array in different ways. Each token saves the previous comment (if any), so that commenting information is saved. Some of our rules require the use of comments in a special style, you will see it below.
The purpose of this design was to implement a fast tokenizer, add a couple of simple rules and release it from Facebook, with the hope that people will add interesting rules to the analyzer. The decision to add parsing was thrown out, but our engineers added about two dozen rules that we will check in a minute.
Flint is written in D, and this is the first open source project in this language from Facebook. In fact, our first version was written in C ++; the translation to D began as an experiment. According to the results of measurements and stories in which we participated, the transfer to D was a victory in all directions: the D version turned out to be much smaller, it was going much faster, it started much faster and was easier to add changes.
')

Flint translation from C ++ to D

To move from C ++ to D, I decided to make a semi-mechanical translation, that is, use the closest D code with the same semantics as the C ++ code.
When implemented in C ++, I found out that using the token generator brought more problems than good, so I just wrote a fast, dedicated lexer using macros. There are no macros in D, which makes one-to-one translation impossible. But D has something better — a full interpretation during compilation, which is combined with the ability to self-analyze, generate and compile the generated code on the fly.

I wrote a 58 line function that generates an expanded tree of matches. For C ++, the code is expanded into 965 lines, which are then fed back to the compiler with a mix of expressions.

There is at least one plus in this matching - it is independent of the language that we break into lexemes; this approach makes it possible to stick it into the library and leave only language-specific parts (such as parsing numbers or comments). With this approach, it becomes easier to create and use optimal lexers for any language without the need for a third-party generator. The implementation of this solution took part of the first day, after which I had a working lexer, the following days were spent on porting the analyzer and checking it on myself.

After translation, the flint editing / build / test cycle has become much more pleasant. The flint build on D is about five times faster than the C ++ build, which turned out to be very important in iterative development. The launch speed was a little better, between 5 and 25% - depending on the file, the gain was greater for large files.

Quick build has an interesting side effect for me. The difference between normal build times in C ++ and D is not surprising, but it was the first time I worked on similar projects in parallel, switching between them, which allowed me to feel the difference.

C ++ can produce fast code, but C ++ build is accompanied by a whole bunch of unnecessary gestures and lots of noise. The initiation of the assembly is inevitably accompanied by some ceremoniality and pomp (“back, guys, returns are possible!”) And during the day I would carefully monitor the assembly to maximize the benefit of time spent. The build cycle of projects on D happens to be discouragingly fast — even somehow offensive sometimes. The agility with which the compiler D passes through the code first surprised me, I even suspected an error - that my code was not compiled at all. In fact, the new language is simply able to run better through the code at much higher speeds.

Flint checks

Each check corresponds to the actual name of the function in the code base, so you can simply find it to see how this check works.

Black list of token sequences (checkBlacklistedSequences). Some sequences may simply be prohibited in the organization. On Facebook, we consider “volatile” as forbidden, but not always: we allow the sequence “asm volatile”, which means something else.
Blacklist of identifiers (checkBlacklistedIdentifiers). Some identifiers may be prohibited in the organization. On Facebook, we exclude the infamous function of the C language - strtok. There are safe alternatives for it, so there is no reason to use strtok.
Reserved identifiers (checkDefinedNames). Both in C and C ++ there is an often forgotten naming rule, according to which all identifiers starting with an underscore and a subsequent capital letter, as well as all identifiers containing two consecutive underscores, are reserved for official needs. (Of course in our code there are exceptions to this rule, such as _GNU_SOURCE or _XOPEN_SOURCE, therefore flint also uses whitelist when checking reserved identifiers).
The idiom of including guards (checkIncludeGuard). Most header files should be explicitly protected (by using the #pragma once or #ifndef macro directives), so we added a rule to test this protection.
The order of the arguments in memset (checkMemset). Many of us sometimes wrote memset (& foo, sizeof (foo), 0); and told terrible stories about it, scaring nieces on Halloween. The corresponding flint diagnostic rule makes it easy to avoid this fatal error.
Questionable inclusions (#include) - checkQuestionableIncludes. Many organizations use multiple libraries that are stored for backward compatibility only and are never used in new code. This means that such heders should not be included - a good job for the analyzer. One of the problems we found on Facebook was that some of the headers after the preprocessor became too large to be included in others - this resulted in too much compile time. We had to teach Flint to solve this problem.
Selecting "...- inl.h" files (checkInlHeaderInclusions). A popular way to organize embedded and heavy template code is to appropriately separate inline and template artifacts — for example, “Widget.h” becomes “Widget-inl.h”. The latest file should not be included somewhere other than “Widget.h”, and a special rule follows this.
Initializing the variable itself (checkInitializeFromItself). We found out that people write designers in style
```
class X { .. int a_; X(const X& rhs) : a_(a_) {} X(int a) : a_(a_) {} }; 
```
In this example, you must use a_ (rhs.a_) in the first constructor and a_ (a) in the second. In another writing, almost never makes sense, and the compiler keeps its mouth shut on such code. We like to say: “There is a flint rule for this” when solving a problem.
It is preferable to use shared_ptr p(make_shared(args)) shared_ptr p(new T) (checkSmartPtrUsage). . (checkOSSIncludes). Facebook , , : . Folly, , - . . , ? (checkNamespaceScopedStatics). - . , . . "Widget.cpp" "Widget.h" - (checkIncludeAssociatedHeader). , "Widget.h" - , "Widget.h" - . (checkCatchByReference). 'Nuff said. . explicit, . , /* implicit */. , :shared_ptr p(make_shared(args)) shared_ptr p(new T) (checkSmartPtrUsage). . (checkOSSIncludes). Facebook , , : . Folly, , - . . , ? (checkNamespaceScopedStatics). - . , . . "Widget.cpp" "Widget.h" - (checkIncludeAssociatedHeader). , "Widget.h" - , "Widget.h" - . (checkCatchByReference). 'Nuff said. . explicit, . , /* implicit */. , :
```
 class C { ... //  ,     C(int a) C(const C&&); //  ,    explicit C(int a); /* implicit */ C(char* a); C(int a, double b); }; 
```
Throw specification (checkThrowSpecification). Throw specifications are outdated and should be removed from the code. Flint allows for another improvement: classes that inherit from std :: exception must add throw () in the destructor and implement the what () method.
Check against throwing an exception on pointers initialized with new (checkThrowsHeapException). This prevents the throw new T.
Conflicting directives using namespace (checkUsingNamespaceDirectives). Some namespaces use the same identifier — for example, boost and STL: they both define shared_ptr. We experienced an incompatibility problem, so we added a parser rule that prevents the use of conflicting namespaces at the same time.
Namespaces in header files (checkUsingDirectives). "using namespace" should never occur inside a header. But this can occur inside the inline function in the header file. Facebook believes that the namespace "facebook" can only be entered at the top level. You need to replace the name of a namespace with some of your own.
Incorrect use of libraries (checkFollyDetail). It is a common practice to place library-dependent code in a namespace of the type "detail". Sometimes such code cannot be encapsulated from the client side due to the transparency of the templates in C ++. Flint issues warnings against the use of such namespaces, named in the style of "folly :: detail".
Passing "cheap" types by reference (checkFollyStringPieceByValue). Some types, such as iterators or a pair of iterators, are small and cheap to copy. Therefore, they are easier to pass by value instead of passing by a constant link. An example is StringPiece in Folly, which takes two words and has the semantics of simple copying.
No protected inheritance (checkProtectedInheritance). Protected inheritance is a strange thing, and is left only to complete the picture, but not for practical use.
No implicit conversion operators (checkImplicitCast). Implicit operator conversion is as dangerous as implicit constructor conversion:
```
 class C { //  ,     operator string(); operator bool(); //  ,    /* implicit */ operator string(); explicit operator bool(); }; 
```
A bad and obsolete NULL should be replaced with nullptr everywhere (checkUpcaseNull).
Checking that std :: exception is always publicly inherited (checkExceptionInheritance). Note:
```
 class MyException : std::exception { ... }; 
```
The author wanted to define his own exception class, but forgot to specify inheritance to be public. In accordance with the rules of the language, the default inheritance is private. Ultimately, an interesting effect will manifest itself in the fact that the code that will have to intercept all standard and user exceptions - catch (const std :: exception &), will not be able to work correctly, because private inheritance eliminates implicit type conversion. This is not a type of error that is easy to detect without very complex testing. To prevent such errors, flint statically disables non-public inheritance from std :: exception25.
Check for "fleeting” rvalue constructs that are sometimes useless (checkMutexHolderHasName). Consider: mutex m_lock; ... lock_guard (m_lock); Regardless of the target, this code does nothing useful: it is just a constructor call to lock_guard. This will create a rvalue , which will live between the closing bracket and the semicolon. In confirmation of Carmack's Law ("Everything that is syntactically correct and accepted by the compiler, sooner or later gets into your code") you get a vital need for parsing rules. Ideally, it should intercept all unused variables. At the moment we define only a few common suspicious places.

Flint source discovery

We are very happy to open the source of flint, because it is a good illustration of "the simplest solution that can work" and an interesting example of cross-language translation. We hope that you will find the flint useful - we have already found. Stay tuned for future improvements, we look forward to your feedback.

Thanks

Many thanks to Nicholas Ormrod and Robbert Haarman for viewing an early draft of this article. Too many engineers contributed to the flint source to list them all here; their input is noted in git's journal.

Translator's Note
In the C ++ hub there are a lot of articles about static code analysis - and almost all of them are devoted to PVS Studio and CppCat. When facebook announced its similarity to the analyzer, this news went unnoticed. The article by Alexandrescu at first seemed difficult for me to understand, so I decided to translate it. After the translation, it still remains difficult - so I will be happy with adequate criticism and assistance in translating. There are inaccuracies and incomprehensible places - and I will be grateful to those who will help with these places. I cannot say that I understood the 16th rule - I will be glad if in the comments or personal correspondence someone can explain it - I will add it to the article.
My subjective opinion is that flint checks the style and rules of writing code more than it searches for actual errors - this is clearly inferior to CppCheck and to Viva64 products. Diagnostic rules are really specific - some of them are applicable to any code (protection of headers and the order of their inclusion), and some - the other way around: if you do not plan to release your code outside, then there is no point in abandoning open source libraries.

I am not familiar with the D language, so if someone tries out flint in business and writes an article about it, it will be very cool. Well, comparing a product from Facebook with existing alternatives, a story about adding rules - I think it will be very interesting - we are waiting for the interested author!

Source: https://habr.com/ru/post/224419/

All Articles

Under the hood: build and open source flint

But why not use existing code analyzers?

Tokens, comments and language D, oh!

Flint translation from C ++ to D

Flint checks

Flint source discovery

Thanks

More articles: