Checking Vim with PVS-Studio on GNU / Linux

The reader might think that this is another article about testing another project from the free software world, but in fact, the article is not so much about testing as about the practice of using the PVS-Studio analyzer in a fully GNU / Linux environment. It is not by chance that Vim became the choice of the project for verification, for he, too, served in this matter.

First, a little about Vim

Vim ( http://vim.org ) is a cross-platform free text editor with a 30-year history, which is the successor of the vi editor and came from the world of Unix systems.
')
Vim is widely used in administration and development, in many GNU / Linux distributions it is the default editor. Vim differs from other text editors with its focus on using only the keyboard, a text interface, and rich extensibility through a system of plug-ins written in Vim Script.

Now about the test itself

One option for testing Linux projects is to integrate into the build system, for example, GNU Make. To verify vim, this method was chosen. For each call to the compiler, a call to the analyzer was added to the makefile. For convenience, this call was wrapped in a Make variable in a similar way:

#PVS Studio vars PVS_CFLAGS = $(ALL_CFLAGS) PVS_INCFLAGS = -I$(srcdir) PVS_STUDIO = ~/PVS-Studio/PVS-Studio –cfg \ ~/PVS-Studio/PVS-Studio_vim.cfg --source-file \ $< --cl-params $(PVS_CFLAGS) -c $(PVS_INCFLAGS) $<

Next, the project is built in the usual mode with the make command (if you wish, you can add a separate target for analysis, for example, ".analysis"). The result, in addition to the collected project, is a raw analyzer log.

Note. The analyzer can be launched in parallel when building a project in parallel. Each running instance of the analyzer writes its own portion of messages to the log. Therefore, keep in mind that the analyzer does not clear the file with the raw log. Therefore, before restarting the check, you must delete the log from the previous check yourself.

It is impossible to work with a raw log. There are many duplicate messages in this log (when one .h file is included in several .cpp files). To change the parameters of the analysis, it is necessary, after editing the configuration file, to run a check again, which seriously affects the time for large projects. Even if we just wanted, for example, to disable warnings related to files from a specific directory. To solve this problem, C ++ has written the log-parser utility, which processes the PVS-Studio raw log, removes duplicate messages, applies filters to messages from its options file and displays analyzer warnings in one of the supported formats. The utility works very quickly (even on the verification logs of very large projects, its work time does not exceed 2-3 seconds), which makes it possible to quickly and easily change some analysis parameters and get a new list of messages.

If necessary, you can easily add new output formats. At the moment there are two: xml and the so-called errorfile. As far as I understand, he does not have an official name, this is the format in which many programs under Linux give out messages, for example, grep, gcc compilation errors, etc. It was he who helped us.

Unlike Windows, where the vast majority of developers use Visual Studio, in the GNU / Linux world there are many IDEs, text editors, and others with their share of users. There is no consensus and advantage of the parties, and everyone chooses an instrument to his liking. However, when checking projects, it is important not only to get the analysis results, but also to have the opportunity to work with them conveniently, as this possibility is provided by the integration of PVS-Studio with Visual Studio. The error message format described above is a kind of standard for Linux programs and most editors and development environments have some kind of support for it, but unfortunately, in most cases, this support exists only for reading compiler messages from stderr when building, in this however, it is much more convenient to take the analyzer messages from a previously prepared file.

This is where the Vim editor came in handy. Of course, you can develop an appropriate plugin for any of the other tools, but it turned out that Vim has this feature initially.

Figure 1 - Running vim with the results of the analysis.

After the analyzer and the log processing utility, it suffices to execute vim -q <errorfile>, and then in the opened editor, execute the buffer creation command with errors. For example: cw 20. And now we get a comfortable environment for working with analyzer messages and navigating through the code. Of course, I had to spend several hours studying Vim myself, since I hadn’t worked in it before, and its principles of use are very different from the more familiar text editors. But in the end, I can say that he was very imbued with the convenience of using it and from the category of incomprehensible alien gizmos he moved into the category of powerful tools for work. Accordingly, it did not take long to choose a project for verification - of course, Vim himself. The vim code turned out to be very high-quality, there were no obvious bugs in it (although the style of writing the code is rather controversial in places, but I think you can make a discount on the project’s age. Nevertheless, there were a number of places worth paying attention to. Let's talk about them in more detail .

Extra check

  if (ptr == NULL) { if (compl_leader != NULL) ptr = compl_leader; else return; /* nothing to do */ } if (compl_orig_text != NULL) { p = compl_orig_text; for (len = 0; p[len] != NUL && p[len] == ptr[len]; ++len) ; #ifdef FEAT_MBYTE if (len > 0) len -= (*mb_head_off)(p, p + len); #endif for (p += len; *p != NUL; mb_ptr_adv(p)) AppendCharToRedobuff(K_BS); } else len = 0; if (ptr != NULL) AppendToRedobuffLit(ptr + len, -1);

Analyzer Warning V595 (1) The 'ptr' pointer was used before it was verified against nullptr. Check lines: 3922, 3933.

The ptr pointer has already been checked for NULL; if this condition is true, it has been assigned the pointer comp_leader, which is definitely not null. A second check is required.

Weird memset

 /* * If requested, store and reset the global values controlling * the exception handling (used when debugging). Otherwise avoid * clear it to a bogus compiler warning when the optimizer * uses inline functions... */ if (flags & DOCMD_EXCRESET) save_dbg_stuff(&debug_saved); else vim_memset(&debug_saved, 0, 1);

where debug_saved is a structure object

 struct dbg_stuff { int trylevel; int force_abort; except_T *caught_stack; char_u *vv_exception; char_u *vv_throwpoint; int did_emsg; int got_int; int did_throw; int need_rethrow; int check_cstack; except_T *current_exception; };

Analyzer message: V512 (1) A call of the 'memset' function will be carried out.

It is difficult to say why it was necessary to reset only the first byte of the structure. If this is used as a flag, it is better to define it with a separate structure field (you can union).

Suspicious cycle

 /* check for out-of-memory */ for (i = 0; i < num_names; ++i) { if (names[i] == NULL) { for (i = 0; i < num_names; ++i) vim_free(names[i]); num_names = 0; } }

Analyzer Message: V535 (1) The variable 'i' is being used for the loop. Check lines: 1893, 1897.

In the outer and inner loop, the same array is passed through the same counter i. It is clear that when the first condition is triggered if (names [i] == NULL), the next step of the external cycle will not be executed, but the outsider will have to strain to understand this code correctly, and the peculiarity of its writing suggests either the behavior meant the author. In other words, although there is no error, but the code smells a bit. Perhaps the break operator would be better suited to stop the loop.

Scopes

 char_u *p, *old; //... { char_u buffer[BUFLEN + 1]; //... for (p = buffer; p < buffer + len; p += l) //...

Analyzer warning: V507 (2) Point to this array. Such a pointer will become invalid.

There are quite a few similar places in the code in Vim (on the issue of style). The pointer p, declared at the very beginning of the function (in some places with a global scope), stores a pointer to an array that exists only in a smaller scope and which will be deleted when it leaves the code block. At a cursory examination, it seemed to me that after going out of the buffer scope, the p pointer is used only after assigning a new value to it, but you can skip this somewhere. For what purpose, instead of declaring another variable inside the buffer scope (really to save space on the stack?), I don’t understand. This code is poorly supported and inconvenient to read.

Error with signed and unsigned types in the same expression

 for (cu = 1; cu <= 255; cu++) if (VIM_ISDIGIT(cu)) regc(cu);

Where

 #define VIM_ISDIGIT(c) ((unsigned)(c) - '0' < 10)

Analyzer Warning: V658 (2) A value is being subtracted from the unsigned variable. This can result in an overflow. In such a case, the '<' comparison operation can potentially behave unexpectedly. Consider inspecting the '(unsigned) (cu) -' 0 '<10' expression.

This code looks more like a dirty hack. When calculating the expression ((unsigned) - '0' <10), the result of the subtraction will be an unsigned value and, when comparing, both parts of the expression will also be converted to unsigned. Accordingly, when the variable cu is less than the numeric value 0, an overflow occurs. In this case, the code behaves correctly, its purpose (checking whether the character is a digit) performs, but I do not think that there are reasons to use similar techniques in your code without need. The cycle could be done from '0' and do without casting to unsigned.

The pointer is initialized to NULL and is not changed anywhere, moreover, it is used

 char_u *retval = NULL; //... if (round == 2) vim_strncpy(retval, s, len); //  retval //... if (retval == NULL) {

Analyzer Warning: V595 (1) The 'retval' pointer was used before it was verified against nullptr. Check lines: 7903, 7907.

This is like an error. The analyzer says that there is an extra check, but in fact the problem is primarily in another. The retval pointer was initialized to 0, and I did not find a single place in this function where its value would change. In this case, it is strncpy many times. After which he was suddenly decided to check for NULL.

Unsafe use of realloc

 /* TODO: check for vim_realloc() returning NULL. */ l->t = vim_realloc(l->t, newlen * sizeof(nfa_thread_T));

V701 (2) realloc () analyzer warning possible leak: when realloc () fails in allocating memory, original pointer 'l-> t' is lost. Consider assigning realloc () to a temporary pointer.

A very common error in many projects, the essence of this error is described in detail in the message of the analyzer. Fortunately, judging by the commentary, this will be fixed soon. In other places in the code Vim realloc is used correctly.

Some examples of false positives.

 if (ireg_icombine && len == 0) { /* If \Z was present, then ignore composing characters. * When ignoring the base character this always matches. */ if (len == 0 && sta->c != curc) result = FAIL;

V560 (2) A part of conditional expression is always true: len == 0.

V571 (2) Recurring check. The 'len == 0' condition was already verified in line 6032.

 if (VIsual_active) { if (VIsual_active && (VIsual_mode != wp->w_old_visual_mode || type == INVERTED_ALL))

V571 (2) Recurring check. The 'VIsual_active' condition was already verified in line 1515.

There are a few more places with similar checks. They do not cause much interest for discussion, in most cases they do not bear harm, but in some cases there may be a logical error, therefore such places should be paid attention to.

Just a bad code, in the structure fill only the first byte.

 #ifdef FEAT_TAG_BINS /* This is only to avoid a compiler warning for using search_info * uninitialised. */ vim_memset(&search_info, 0, (size_t)1); #endif

V512 (1) A call of the "memset" function will lead you to go.

The commentary explains why this is done, but the method is rather strange. There are many more beautiful ways to avoid compiler warnings.

Bad practice using short names

 extern char *UP, *BC, PC;

Analyzer Warning: V707 (2) Giving short names for global variables. It is suggested to rename 'UP', 'BC', 'PC' variables.

This practice is not uncommon in the Vim code. Many variables have 1- and 2-letter names, often with a large scope, and in this case generally with a global one. Combined with functions that are 500+ lines long, this leads to very hard-to-read code.

Strange i assignment in condition

 int i = 2; /* index in s[] just after <Esc>[ or CSI */ //... if (n >= 8 && t_colors >= 16 && ((s[0] == ESC && s[1] == '[') || (s[0] == CSI && (i = 1) == 1)) && s[i] != NUL && (STRCMP(s + i + 1, "%p1%dm") == 0 || STRCMP(s + i + 1, "%dm") == 0) && (s[i] == '3' || s[i] == '4'))

Analyzer Warning: V560 (2) A part of the conditional expression is always true: (i = 1) == 1.

It is difficult to say whether this is a mistake or just a strange way to assign i to the unit. Write so obviously not worth it.

Conclusion

In conclusion, I would like to say that now checking projects using PVS-Studio under GNU Linux without using a Windows machine is quite real and quite convenient. Vim helped in this number, for which he was subjected to such a verification scenario first.

This article is in English.

If you want to share this article with an English-speaking audience, then please use the link to the translation: Anton Tokarev. Analyzing Vim by PVS-Studio in GNU / Linux .

Read the article and have a question?

Often our articles are asked the same questions. We collected the answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please review the list.

Source: https://habr.com/ru/post/253553/

All Articles