clang and IDE: a story about friendship and foe

For me, it began six and a half years ago, when, by the will of fate, I was dragged into one closed project. Whose project - do not ask, do not tell. Let me just say that his idea was simple as a rake: embed the front-end clang into the IDE. Well, as it was recently done in QtCreator, in CLion (in a sense), and so on. Clang was then a rising star, many were dragged from the opportunity to finally use a full-featured C ++ parser almost for free. And the idea, so to speak, literally soared in the air (and the auto-complete code embedded in the clang API, as the code hinted at), you just had to take and do it. But, as Boromir said, "You can’t just take it, and ...". It happened in this case. For details - well under cat.

First about the good

The benefits of using clang as a C ++ built into IDE are definitely a parser. In the end, IDE functions are not limited solely to editing files. This is a database of symbols, and navigation tasks, and dependencies, and much more. And here a full-fledged compiler steers to its full height, for mastering all the power of the preprocessor and templates in a relatively simple self-written parser is not a trivial task. Because usually you have to make a lot of compromises, which obviously affects the quality of the code parsing. Who is interested - can look, for example, on QtCeator's built-in parser here: Qt Creator C ++ parser

In the same place, in source codes of QtCreator, it is possible to see that listed above - not all that is required by IDE from the parser. In addition, you need at least:

syntax highlighting (lexical and semantic)
all kinds of hints "on the fly" with the display of information on the symbol
hints on what is wrong with the code and how it can be corrected / supplemented
autocompletion (Code Completion) in a wide variety of contexts
the most diverse refactoring

Therefore, on the previously listed benefits (really serious!), The advantages end and the pain begins. To better understand this pain, you can start by looking at the report by Anastasia Kazakova ( anastasiak2512 ) about what is actually required of the code parser built into the IDE:

The essence of the problem

And it is simple, although it may not be obvious at first glance. In a nutshell, then: clang is a compiler . And it refers to the code as a compiler . And sharpened by the fact that the code they give to him is already finished, and not the stub of the file that is now open in the IDE editor. Compilers do not like file bites, as well as incomplete constructions, incorrectly written identifiers, retrun instead of return, and other delights that may arise here and now in the editor. Of course, before compilation, all this will be cleaned, corrected, aligned. But here and now, in the editor, it is what it is. And it is exactly in this form that the parser built into the IDE arrives at the table every 5-10 seconds. And if the self-written version of it perfectly "understands" that it is dealing with a semi-finished product, then clang is not. And very surprised. What happens as a result of such a surprise - depends "on", as they say.

Fortunately, clang is reasonably tolerant of errors in the code. Nevertheless, there may be surprises - suddenly disappearing illumination, autocomplit curve, strange diagnostics. To all this you need to be ready. In addition, clang is not omnivorous. He has the right not to accept anything in the compiler headers, which is here and now used to build the project. Sly intrinsics, non-standard extensions and other, um ..., features - all this can lead to parsing errors in the most unexpected places. And, of course, performance. It will be a pleasure to edit the grammar file on Boost.Spirit or work on the llvm-based-project. But, about everything in more detail.

Code semi-finished

Here, for example, you started a new project. The environment generated a default disc for main.cpp for you, and in it you wrote:

#include <iostream> int main() { foo(10) }

The code, from the point of view of C ++, frankly, is invalid. There is no definition of the function foo (...) in the file, the line is not completed, etc. But ... You just started. This code has the right to exactly this kind. How does this code take IDE with a self-written parser (in this case CLion)?

And if you click on a light bulb, you can see this:

Such an IDE, knowing something, um, more about what is happening, offers a quite expected option: to create a function from the context of use. Great offer, I think. How does IDE based on clang behave (in this case, Qt Creator 4.7)?

And what is proposed in order to correct the situation? And nothing! Only standard rename!

The reason for this behavior is quite simple: for clang, this text is complete (and it cannot be anything else). And he builds AST based on this assumption. And then everything is simple: clang sees the previously undefined identifier. This is C ++ text (not C). No assumptions are made about the nature of the identifier - it is not defined, so the code fragment is invalid. And in the AST for this line nothing appears. She simply does not. And what is not in the AST - it is impossible to analyze. It's a shame, annoying, well, okay.

The parser built into the IDE is based on some other assumptions. He knows that the code is not finished. That the programmer has a thought right now and the fingers do not have time for it. Therefore, not all identifiers can be defined. Such code, of course, is incorrect in terms of high quality standards of the compiler, but the parser knows what can be done with such code and offers options. Quite reasonable options.

At least up to version 3.7 (inclusive), similar problems arose in this code:

 #include <iostream> class Temp { public: int i; }; template<typename T> class Foo { public: int Bar(Temp tmp) { Tpl(tmp); } private: template<typename U> void Tpl(U val) { Foo<U> tmp(val); tmp. } int member; }; int main() { return 0; }

Inside the clang-based autocomplete template class methods did not work. As far as I was able to figure out, the reason was in the two-pass pattern parsing. Autocomplete in clang works on the first pass, when information about the types actually used may not be enough. In clang 5.0 (judging by the release notes) it was fixed.

Anyway, there may well be situations in which the compiler is unable to build the correct AST (or draw the right conclusions from the context) in the code being edited. And in this case, the IDE will simply not “see” the corresponding sections of the text and will not be able to help the programmer in any way. Which, of course, is not great. The ability to work effectively with incorrect code is what is vital for the parser in the IDE, and what the normal compiler does not need at all. Therefore, the parser in the IDE can use many heuristics, which for the compiler may not only be useless, but also harmful. And to implement in it two modes of operation - well, it is still necessary to convince the developers.

"This role is a dirty one!"

The programmer’s IDE is usually one (well, two), but there are many projects and toolchains. And, of course, I don’t want to make unnecessary gestures to switch from the toolchain to the toolchain, from project to project. One or two clicks, and the build configuration changes from Debug to Release, and the compiler changes from MSVC to MinGW. But the code parser in the IDE remains the same. And he should, together with the assembly system, switch from one configuration to another, from one toolchain to another. A tulchane can be some kind of exotic, or cross. And the task of the parser is to continue correctly parsing the code. If possible with a minimum of errors.

clang is omnivorous enough. It can be forced to accept extensions from Microsoft compilers, the gcc compiler. You can pass options to it in the format of these compilers, and clang will even understand them. But all this does not guarantee that the clang will take any title from the giblets collected from the gcc tank. Any __builtin_intrinsic_xxx can be a stumbling block for him. Or language constructs that the current version of the clang in the IDE simply does not support. The quality of the construction of the AST for the current edited file is most likely not affected. But building a global base of characters or preserving precompiled headers may break. And this can be a serious problem. Even more of a problem may be such a code not in the headers of tulcheins or third parties, but in the headers or sources of the project. By the way, all this is quite a significant reason to explicitly tell the build system (and IDE) which header files for your project are "alien". It can make life easier.

Again, IDE was originally designed to be used with different compilers, settings, toolchains, and more. Designed for the fact that you have to deal with the code, some of the elements of which are not supported. The IDE release cycle (not all :) is shorter than the compilers, therefore, there is the potential to more quickly pull up new features and respond to the problems found. In the world of compilers, things are a little different: the release cycle is at least a year, the problems of cross-compiler compatibility are solved by conditional compilation and passed on to the developer’s shoulders. The compiler does not have to be universal and omnivorous - its complexity is already high. clang is no exception.

Fight for speed

That part of the time spent in IDE, when the programmer is not sitting in the debugger, he edits the text. And his natural desire here is to be comfortable (otherwise why an IDE? You can do with a notebook!) Comfort, including, implies a high speed of the editor’s reaction to text changes and hotkeys. As Anastasia correctly noted in her report, if, after five seconds after pressing Ctrl + Space, the environment did not respond with the appearance of a menu or a list of autocomplex, this is terrible (I seriously, try it yourself). In numbers, this means that the parser built into the IDE has about one second to evaluate the changes in the file and rebuild the AST, and another two or two to offer the developer a context-sensitive choice. Second. Well, maybe two. In addition, the expected behavior is considered that if the developer changed the .h-nickname, and then switched to .cpp-schnick, then the changes will be "visible". The files are open in the adjacent windows. And now a simple calculation. If the clang, launched from the command line, copes with the source for ten to twenty seconds, then where is the reason to believe that being launched from IDE it will cope with the source much faster and fit this very second or two? That is, it will work much faster? In general, this could be finished, but I will not.

About ten to twenty seconds on the outcome, I, of course, exaggerate. Although, if some heavy API is included there or, say, boost.spirit with Hana at the ready, and then all this is actively used in the text - then 10-20 seconds are still good values. But even if AST will be ready in three or four seconds after launching the built-in parser, this is already a long time. Provided that such launches should be as regular (maintain the code model and index in a consistent state, highlight, prompt, etc.), as well as on demand - after all, code completion is also the launch of the compiler. Is it possible to somehow reduce this time? Unfortunately, in the case of using clang as a parser, there are not so many possibilities. Reason: this is the thirdparty tool in which ( ideally ) changes cannot be made. That is to delve into the clang code with a perftool, optimize, simplify any branches - these features are not available and you have to manage with the fact that it provides an external API (in the case of using libclang, it is also quite narrow).

The first, obvious, and, in fact, the only solution is to use dynamically generated precompiled headers. With adequate implementation, the solution is a slaughter. Increases compilation speed at times at least. Its essence is simple: the environment collects all thirdparty headers (or headers beyond the project root) into one .h file, makes a pch from this file, and then implicitly includes this pch in each source. Of course, there is an obvious side effect: in the source code ( at the editing stage ), symbols can be seen that do not include it. But this is a fee for speed. You have to choose. And everything would be fine if it were not for one small problem: clang is a compiler. And, being the compiler, he does not like errors in the code. And if all of a sudden (all of a sudden! - see the previous section) there are errors in the headers, then the .pch file is not created. At least it was up to version 3.7. Has anything changed since then in this regard? I do not know, there is a suspicion that no. Opportunities to check, too, alas, no longer.

Alas, alternative options are not available for the same reason: clang is a compiler and the thing is "in itself." Actively intervene in the process of generating AST, somehow making it merzhit AST from different pieces, to keep the external database of characters and te te te te te te te pe - alas, all these features are not available. Only external API, only hardcore and settings available through compilation options. And then the analysis of the resulting AST. If you sit on the C ++ version of the API, then the possibilities become available a little more. For example, you can play around with custom FrontendActions, make more fine-tuning of compilation options, etc. But even in this case the main point will not change - the edited (or indexed) text will be compiled independently of others and completely. Everything. Point.

It is possible (maybe!) Someday there will be a fork of the upstream clang, specially sharpened for use as part of the IDE. Maybe. But for now, everything is as it is. Say, Qt Creator's integration (up to the “final” stage) with libclang took seven years. I tried QtC 4.7 with a libclang-based engine - I confess, I personally like the old version (on the samopisny) more simply because it works better on my cases: it prompts, and highlights, and everything else. I will not undertake to assess how many man hours they spent on this integration, but I would venture to suggest that during this time it would be possible to finish your own parser. As far as I can tell (based on indirect signs), the team working on CLion looks cautiously towards integration with libclang / clang ++. But this is a purely personal assumption. Integration at the Language Server Protocol level is an interesting option, but specifically for the C ++ case, I tend to view it more as a palliative for the reasons listed above. It simply transfers problems from one level of abstraction to another. But perhaps I am mistaken for the LSP - the future. We'll see. But anyway, the life of the developers of modern IDE for C ++ is full of adventures - with or without clang as a backend.

Source: https://habr.com/ru/post/419009/

All Articles