The subtleties of analyzing C / C ++ source code using cppcheck

In the previous post , the main features of the open source static analyzer cppcheck were considered. It does not show itself from the worst even with the basic settings, but today it will be a question of how to squeeze the most useful out of this analyzer.

This article will look at the possibilities of cppcheck in capturing memory leaks, useful parameters for improving the analysis, as well as an experimental opportunity to create your own rules. Today there are no comparisons of “who's better” analyzers, the article is completely devoted to working with cppcheck.

Download and install

You can download cppcheck from the official website , there is an installer for Windows, and for Linux I recommend downloading the source code from the gita , since it is very easy to assemble and has no dependencies. Building from the source code will enable an experimental feature, which will be discussed at the end of the article.

In particular, for my Linux machine, I forked a cppcheck on GitHub and made a git clone fork. This will make it possible in the future to commit to the repository its own configs and hand-written verification rules, periodically synchronizing with the main repository, which you will agree is very convenient (not to mention the possibility of sending patches to the project).
')

We collect for Linux

Building in Linux is extremely simple: download, unpack, change directory and make:

unzip cppcheck-master.zip cd cppcheck-master make

We collect for Windows

Building in Windows shouldn't be difficult either - there is a project for VS. We open, we collect. I have not tried it myself, since I do not have such a need.

Cppcheck as plugin

Regarding plugins for IDE, there are plugins for Code :: Blocks, CodeLite, Gedit and Eclipse, of course. But this is not limited to this, as there are plugins for Hudson, Jenkins assembly farms, for Tortoise SVN and Mercurial version control systems. There is no plugin for Visual Studio, but there is a very nice phrase on the main page:

Cppcheck as an external tool. You can also try the proprietary PVS-Studio (there is a free trial), which is for this environment.

In cppcheck there is a GUI written in Qt. It even more simplifies the analysis process, but we don’t go into much detail in it - severe programmers do not use graphical interfaces :) Moreover, the GUI completely repeats the command line capabilities (and in some cases, is inferior) and sort out it after learning about cppcheck labor will not make up.

It is worth noting that cppcheck is distributed under the GNU GPL license. This allows you to easily take the source code of this program, drag Git into your repository and finish there for any needs, adding your own rules and libraries.

Analyzer Setup

The good thing about cppcheck is the ability to fine tune. You can adjust the level of sensitivity, format the output of messages, and filter some annoying messages.

For the sake of fairness, I note that all the information outlined below can be obtained from the documentation or by executing the command cppcheck --help , but I will focus on the most important nuances in more detail (and in Russian :). The cppcheck documentation is getting better from year to year, so it is sometimes useful to reread it.

If it seems to you that I am engaged in the retelling of documentation - go on to read the next section, because here is the information for those who have never used cppcheck and analyzers in principle before.

Error levels

By default, cppcheck tries, if possible, to display only 100% errors. This means that if the analyzer found a suspicious area, but is not sure that it is a bug, it will not report it. This feature allows you to use cppcheck at assembly stations, so that due to false positives of the analyzer, the assembly does not stop after each commit.

An example of how this works. Suppose you need to analyze the following code:

 void f() { char *a = malloc(100); process_a(a); }

At first glance, there is an error: no free . However, if the function process_a is a library function, it is impossible to say with certainty that process_a somewhere inside does not make free for the pointer a . If, inside the process_a function, the variable a is actually released, this is a false positive that will interfere with the analysis. Therefore, cppcheck will first try to find the implementation of the process_a function in the code of the program being analyzed, make sure that it does not call free , and only in this case will generate an error. If the implementation is not found, cppcheck assumes the most favorable scenario possible: process_a frees the memory and therefore does not generate an error. However, cppcheck can be “taught” to recognize the functions of libraries, thereby increasing the accuracy of the analysis - this will be discussed below.

Second example:

 void f() { char *a = malloc(100); if(random()) g_exit(0); free(a) }

There is obviously already a memory leak, since g_exit is a wrapper of the GLib library over the standard exit function. But cppcheck does not know that g_exit interrupts program execution, so the analyzer will not be able to recognize errors here. It is necessary to somehow provide cppcheck with the information that the g_exit function can interrupt the program so that the analyzer learns to recognize such errors.

From the examples it is clear that cppcheck is reinsured, and the adequacy level is high. Most cppcheck checks do not include by default. Among them are the following categories of checks, each of which can be turned on / off independently:

error - obvious errors that the analyzer considers critical and usually lead to bugs (enabled by default);
warning - warnings, there are messages about unsafe code;
style - stylistic errors, messages appear in the case of inaccurate coding (more like recommendations);
performance - performance problems, here cppcheck offers options on how to make the code faster (but this does not always give a performance boost);
portability - compatibility errors, usually associated with different behavior of compilers or systems of different length;
information - informational messages arising during the test (not related to errors in the code);
unusedFunction - an attempt to calculate unused functions (dead code), does not know how to work in multi-threaded mode;
missingInclude - check for missing #include (for example, use random , but forgot to connect stdlib.h ).

Checks are enabled with the --enable parameter, the list of check categories is separated by commas. For example:

 cppcheck -q -j4 --enable=warning,style,performance,portability ./source

Thus, I usually include the most important checks. There is the keyword all , which includes all the listed checks.

Note The -j parameters and the unusedFunction check mode are incompatible, so -j will turn off the unusedFunction check, even if it is explicitly specified.

An example of a command that “drives” the code according to all the rules:

 cppcheck -q --enable=all ./source

And that is not all. If your program is error-free from the point of view of the analyzer, try running cppcheck with the --inclusive option . This mode does include all possible checks, even errors with low probability, which cppcheck misses by default.

Thus, the most detailed verification mode:

 cppcheck -q --enable=all --inconclusive ./source

Do not forget about cross-platform!

Cppcheck was originally created as a tool that works with different operating systems and platforms. Therefore, it is imperative to keep track of which platform the program is written for and which verification mode uses cppcheck. The platform is switched by the --platform parameter.

Different platforms:

unix32 - all 32-bit * nix (including Linux);
unix64 - all 64-bit * nix;
win32A is a 32-bit Windows family with ASCII encoding;
win32W is a 32-bit Windows family with UNICODE encoding;
win64 is a family of 64-bit Windows.

If you need to check the code that was written for Win32 using Linux, you must specify the platform:

 cppcheck --platform=win32A ./source

You can specify the standard by which the source code is written, which sometimes allows you to refine the check and catch new errors. Use the --std option with the following options:

posix - for POSIX compatible OS (including Linux);
c89 - C language, the standard of the 89th year;
c99 - C language, standard of the 99th year;
c11 - C language, standard 11th year (default for C);
c ++ 03 - C ++ language, standard of the 3rd year;
c ++ 11 - C ++ language, standard of the 11th year (default for C ++).

You can use two standards at once:

 cppcheck --std=c99 --std=posix ./source

Useful command line arguments

Probably already striking that I always use the parameters -q , -j . What are they needed for? Consider the most interesting.

-j is a very useful parameter that allows you to run a scan in multi-threaded mode. It is very simple to use - the number of processors is passed as a parameter and the check will go more fun.
-q - silent mode. By default, cppcheck issues informational messages about the progress of the check (which can be very much). This option completely disables informational messages, only error messages remain.
-f or --force - enable iteration of all variants of ifdef directives (by default, cppcheck checks a dozen variants). What is it - then it will be considered separately.
-v - debug mode - cppcheck gives internal information on the progress of the check.
--xml - display the result of the check in XML format.
--template = gcc - display errors in the gcc compiler warning format (convenient for integration with an IDE that supports this compiler).
--suppress - error suppression mode with the specified identifiers (need to be re-analyzed).
-h - issues a certificate for all parameters in pure English.

Message Filtering and Exceptions

Like any self-respecting cppcheck analyzer, you can flexibly customize the display of errors during the test. Most useful is it is possible to disable a specific warning in a specific file (and possibly in a specific line). Disabling messages is implemented by the --suppress parameter, where you need to specify an exception, or --suppress-file with a text file, which contains a list of exceptions. To pass several exceptions in the command line, you can specify several options --suppress in a row, but it is better to file a file with exceptions for such purposes.

Exception format:

 id[:file:[line]]

Mandatory parameter id (error identifier), followed by a colon, you can optionally specify the file name, after the file name, also optionally, you can specify the line number.

For example, very often this warning pops up:

 The scope of the variable '%VAR' can be reduced.

This error is inherent in most projects and so that it does not discourage the desire to refactor, at first I recommend turning it off. This can be done in the following way:

 cppcheck -q -j4 --enable=all --suppress=variableScope ./source

As you can see from this example, cppcheck uses human-readable error identifiers rather than numbers, which is much easier to remember.

Find out the list of all possible errors will help parameter --errorlist , which gives a complete list in XML format. But I can advise another method of determining "unwanted" messages. To do this, you will need to change the message display format using the - template parameter

 cppcheck -q -j4 --enable=all --template='{id} {file}:{line} {message}' ./source

My template now first displays the message ID, followed by the file with the line number and, finally, the message itself. Find out the ID of the interfering message is not difficult.

Sample output

 variableScope geany/src/document.c:1099 The scope of the variable 'use_ft' can be reduced. variableScope geany/src/document.c:1257 The scope of the variable 'filename' can be reduced. variableScope geany/src/document.c:2306 The scope of the variable 'keywords' can be reduced. variableScope geany/src/document.c:3011 The scope of the variable 'old_status' can be reduced. variableScope geany/src/editor.c:194 The scope of the variable 'specials' can be reduced. variableScope geany/src/editor.c:248 The scope of the variable 'ptr' can be reduced. variableScope geany/src/editor.c:1545 The scope of the variable 'text' can be reduced. variableScope geany/src/editor.c:4309 The scope of the variable 'tab_str' can be reduced.

Finally, a recipe for automatically creating a file for use in the --suppress-file parameter. On the command line, this is done in two accounts:

 cppcheck -q --enable=all --template='{id}:{file}:{line}' ./source > suppress-list.txt

Now the resulting file can be submitted to the cppcheck input and there will not be a single error in the output. This is useful when the analysis is completed and all analyzer responses are false.

Cppcheck also allows you to write exceptions inside comments, but the example above will avoid cluttering up the source code by collecting all the exceptions in one place, rather than scattering them throughout the project.

We understand with include and define

Cppcheck understands some compiler options, which allows you to specify which path is being tested. Since cppcheck does not use the services of a compiler, it has its own preprocessor. This preprocessor does not require the presence of all the header files, nor the correctness of the source code. If somewhere there is an unknown include , cppcheck simply does not process it.

Uncertainty can play a cruel joke. A common practice in the GLib library is to check the arguments:

 void f(gchar *s1, gchar *s1) { g_return_if_fail(s1); gchar *a = g_strdup(s1); g_return_if_fail(s2); gchar *b = g_strdup(s2); }

All is well with the exception that g_return * are macros that interrupt the execution of a function in case of errors. Thus, if the first argument of the function f turns out to be correct, and the second does not, a memory leak occurs. Cppcheck does not know about this, since it considers g_return_if_fail by default to be a “good function” and not a macro.

You can change the behavior of cppcheck if you include the necessary header files so that the preprocessor does all the necessary work: it finds the implementation of the g_return_if_fail macro, opens it, and cppcheck sees the conditional return without free , which is a memory leak pattern.

In order to make the preprocessor work as it should, you need to specify the paths where to look for the header files. The -I parameter, which is similar to the gcc compiler of the same name, is responsible for this. For GLib and Linux, this is quite a predictable path:

 cppcheck -q -I/usr/include/glib-2.0/ ./source

An interesting feature (which greatly increases the analysis time) is a search of ifdef variants. If there is one ifdef in the program, cppcheck will make two preprocessing options and scan both versions of the source code. The more ifdef-branches in the source code, the more options you have to iterate. You can control this behavior with the -D and -U parameters. The -D A parameter means that the macro A is defined. The -U B parameter means that the macro B is not defined.

By default, only a dozen configurations are checked. You can change this number with the --max-configs parameter. To check only one configuration, you can set one configuration check. The --force option will check all configurations (very slowly).

Cppcheck does not distinguish macros from header files from macros defined in the source code. If the analysis is strengthened by preprocessing all #include , specifying a directory with header files - prepare for a very long analysis - cppcheck will peel all macros from all header files that the preprocessor has reached.

The use of the preprocessor is the easiest way to improve the check, but it is obvious that the time of this check is greatly increased, since the preprocessing inflates the source code to hundreds of megabytes. Using the GLib example, a more efficient way to catch memory leaks will be shown, using another cppcheck feature to provide information about third-party libraries.

We write the implementation of functions independently

It is worth noting that the -I parameter only tells cppcheck where to look for the header files and connects them only if there is a corresponding #include in the source file. You can use a slightly more costly alternative: implement the most frequent macros and functions manually and connect them to all project files. We will have to work a little to create such a file, but the analysis will go much faster and more accurately. Moreover, you can use some interesting tricks with macros.

Connecting a file with the implementation of functions is implemented by the parameter --append . The specified file is automatically inserted at the end of each project file.

Connecting a file with macros is implemented by the --include parameter. The specified file is automatically inserted at the beginning of each project file.

For example, the program uses the GLib library and you need to tell cppcheck that g_return_if_fail is a macro.

Let's try to analyze this code:

 void f(char *s1, char *s1) { g_return_if_fail(s1); char *a = g_strdup(s1); g_return_if_fail(s2); char *b = g_strdup(s2); free(a); free(b); }

Run cppcheck:

 cppcheck -q test.c

Nothing.

Create a file gtk.h with the following content:

 #define g_return_if_fail(expr) do{if(!(expr)){return;}}while(0)

Since this is a macro, it must be included at the beginning:

 cppcheck -q test.c --include=gtk.h

Hm Nothing again? If you look at the code in the example, you can notice the function g_strdup , about which cppcheck does not know anything yet. Let's try to write the simplest implementation ( gtk.c file):

 char * g_strdup(const char *s) { return strdup(s); }

Note that this is a function, not a macro.
We analyze. The file with the function implementation is inserted by the --append parameter;

 cppcheck -q test.c --include=gtk.h --append=gtk.c [test.c:4]: (error) Memory leak: a

Done! Now cppcheck has learned to detect new memory leaks. A smart analyzer climbed inside the g_strdup function code, made a trace of the return value, and found a strdup there, marking the pointers as pointers to the memory to be freed.

This example is not taken from the ceiling: this is one of the most common mistakes in GLib programs that I find all the time.

Tricks with macros and implementations

The mechanism for connecting files to a project is a very flexible tool. With it, you can do such things as the elimination of functions with known behavior, the substitution of function arguments, the definition of missing typedefs.

In many cases, cppcheck does not need to have any header file present or a data type / class defined. Therefore, writing macros only makes sense if they improve the check.

Non-standard memory allocation

Suppose there is a function my_alloc, which allocates memory within its arguments:

 my_alloc(char **a);

The following example will not be recognized as a memory leak:

 void f(){ char *a, *b; my_alloc(&a, &b); }

Now add the implementation:

 void my_alloc(char **a, char **b) { *a = malloc(13); *b = malloc(42); }

and the check will issue the following:

 [test.c:4]: (error) Memory leak: a [test.c:4]: (error) Memory leak: b

We exclude function from check

If there is any function that we do not want to check, it is easy to hide:

 #define unused_func(arg...)

Memory allocation with error flag

Some functions, allocating memory, may report an error through their arguments. For example, this code is correct:

 void f() { int is_ok; char *a = my_alloc(&is_ok); if(is_ok) free(a); }

However, if the cppcheck my_alloc is listed as a memory allocation function, the result of the check will be an error. To avoid it, you can use the following trick:

 char *my_alloc(int *ok) { char *a = malloc(42); if(a) *ok = 1; else *ok = 0; return a; }

Analysis taking into account the features of libraries

Most applications use any libraries, in particular, glibc. The analyzer sometimes needs some information about library functions. For example, malloc allocates memory, and free allocates. If there is a malloc, but no free, it often means a memory leak. These functions are included in glibc and are part of the standard, so the analyzer (and the compiler) is simply obliged to know about them.

As for the rest, ideally, for each library used, the analyzer should have some description of its features in order to be able to detect errors made not only when using glibc.

Why is this so important? If an unknown function is encountered, cppcheck assumes that it is located somewhere in a third-party library and makes the most favorable predictions about it (it frees up memory, does not interrupt the execution of the program, etc.). If a memory leak or uninitialized variable pattern was found somewhere, but there is an unknown function inside, cppcheck blocks the error message.

Consider a simple example.On Linux, many graphics applications use the GLib library. This library practically duplicates glibc, including it has its own malloc / free implementations. Typically, code using GLib looks like this:

 gchar *s = g_strdup("test"); gint *a = g_malloc(sizeof(int) * 10);

The fact that after g_strdup and g_malloc the memory needs to be freed, the person will guess intuitively, even without having previously had experience with GLib. What can not be said about the analyzer: the implementation of the g_malloc function is hidden inside the library's binary code — who knows what it does there?

There is one solution to this situation: keep a database of the most popular libraries and gradually replenish it, fixing the features of the functions of each library. Using the database, the analyzer can easily check any code using the library to find errors in it in the future.

The best part about cppcheck is that it has such an updated database. This means that the community can improve cppcheck by simply adding all the new information about standard libraries, which cppcheck will then use in the analysis.

Cppcheck does not automatically load the necessary libraries, it must be done manually. Libraries are specified with the --library parameter , you can specify multiple libraries, separated by commas. cppcheck first searches for a file named library.cfg in the current directory (it’s convenient to store the library for your project this way), then tries to find it in its library database. If there is no library anywhere, cppcheck will generate an error.

An example of project analysis using the gtk library :

 cppcheck -q --library=gtk ./source

You can clearly indicate in which file the library is located (I must say, without looking at the source code, I would not have guessed that this can be done):

 cppcheck -q --library=my/path/mylib.xml ./source

Today cppcheck supports quite a few libraries:

gtk (in fact, glib, gtk is not supported there)
Qt
windows (all sorts of scanf_s ...)
posix
glibc (standard library ceases to be hard-coded and is gradually being forced into the base)

The list is clearly not rich. Moreover, from the contents of this database, I want to put out a mean tear, for example, the POSIX base:

 <?xml version="1.0"?> <def> <function name="usleep"> <noreturn>false</noreturn> <arg nr="1"><not-bool/><valid>0-999999</valid></arg> </function> <function name="_exit"> <noreturn>true</noreturn> </function> </def>

Only one thing can be learned from this file: cppcheck uses XML to expand its database.

What cppcheck currently knows how to "get" from the library database:

, / — ;
, ( exit);
, (, strlen );
, ;
- ( printf);

( )

The base is generated as an XML file that is placed in the cfg directory of the cppcheck project. That is why I recommended downloading the program source code at the beginning of the article. The cfg files are nailed to the cfg subdirectory next to the executable file and this has not yet been fixed. There are already several libraries inside the cfg directory that can be used as examples to create your own library - there are not very many documentation on this topic.

Each file starts with a header:

 <?xml version="1.0"?>

The entire database is wrapped in a def tag . There are three different tags inside def :

memory - information about the functions working with memory;
resource - similar, but for resources (open / closed), a typical representative is open / close ;
function is just a function.

As for memory and resource , their internal structure is the same:

alloc - the allocator is placed inside the tag, you can add the attribute of the tag init = “true | false” - indicates whether the allocator initializes the memory;
dealloc - the corresponding memory freeing function for all memory allocation functions in a block is placed inside the tag (it is possible to combine several alloc / dealloc in one block);
use — , , (, ). , .

Each memory and resource block must be duplicated for different groups of functions. For example, if the memory allocated by the malloc function needs to be freed with the help of free , and g_malloc with the help of g_free , they need to be placed in pairs in different memory tags .

Function ( function ) - has the greatest number of possibilities. The name of the function is specified as the name attribute of the function tag . Inside the tag you can use:

noreturn - if true , this function does not interrupt execution;
A leak-ignore is a non-standard tag, meaning that this function definitely does not free pointers and can be ignored when checking for memory leaks;
arg - checking the argument, the argument number is specified by the nr attribute (described in detail in the documentation).

Note . The noreturn tag for a function should be set only when it unconditionally interrupts execution. It may seem that this tag describes non-void functions, but it is not. An example of such a function is exit . Because of this particularity, there were great oddities when compiling rules for the GLib library.

Describe the GLib library

GLib is a C library that underlies GTK + and implements object-oriented programming in C (and more recently, introspection has appeared). A lot of projects are built on it: GIMP, GNOME, Xfce, even Chromium / Firefox use it to some extent.

GLib is cross-platform, so even functions such as malloc / free or printf are duplicated inside GLib so as not to depend on the specific implementation or version of glibc supported platforms. As a result, GLib has dozens of functions working with memory, its own error handler, in general, everything that cppcheck does not suspect.

In cppcheck, only relatively recently appeared the beginnings of support for the GLib library. For example, such obvious memory leaks cppcheck can catch:

 void f() { g_malloc(42); }

 cppcheck -q test.c --library=gtk [test.c:3]: (error) Return value of allocation function g_malloc is not used.

However, in GLib hundreds of functions, and such an intricacies cppcheck will not take:

 void f() { gchar *a = g_strdup(s); g_strdown(a); }

This is because the g_strdown function is unknown. Suddenly she frees the memory herself?

The GLib library can be conditionally divided into its component parts:

general purpose allocator functions, all of which are freed by a single g_free function ;
constructors and destructors of objects;
Functions that "absorb" pointers (hash tables, lists) - for such you do not need to report about memory leaks - there will be a lot of false positives;
only one function that interrupts execution is g_exit ;
all other functions that can be ignored. This list is the most important, because it tells cppcheck not to hide the error due to a function that does nothing with pointers.

Let's start forming the list of rules. In GLib and GTK + more than four thousand functions, it is logical that it is almost impossible to assemble them manually. Fortunately, introspection has appeared in newer versions of GLib, which will allow you to effortlessly pull out all the methods from the XML file. Unfortunately, the GLib developers did not think that someone would be interested to know whether it was necessary to free up memory after a particular function, therefore, we had to manually search for constructors and destructors while reading the documentation.

Fortunately, all this work (for the GLib / GTK + family) will not have to be done, since it has already been completed and the result is in my repository .

Since I'm allergic to XML, I made a simplified version of the format for describing functions, which are then glued together in a giant XML file with a small parser written in python. To get an XML file, simply type make. There are two source codes : gtk.rules , which manually lists functions that cannot be parsed automatically, and gtk-functions.rules — an automatically generated file based on the GLib / GTK + XML interfaces. The parser is written so that the functions do not repeat.

At the moment, the library is more or less stable, you can pick it up from here and put it in the cppcheck / cfg directoryby replacing the old one. After that, you can analyze any projects written in GLib / GTK +. After testing and eliminating false positives, I will try to push Pull Request in cppcheck with this base, so there is a chance to see it in the next releases.

Additionally, I made a header file that lists popular g_return_ * macros . With it, cppcheck will not miss the memory errors caused by the insertion of these macros. This file must be attached using the --include parameter .

Source Code Analysis Thunar

A small example of how library usage affects analysis. Thunar version 1.6.3 will be the test subject today , there will be no in-depth analysis (it will not be required for the demonstration), just a scan with standard settings.

First, the analysis of "clean" cppcheck:

 cppcheck -q -j4 --max-configs=1 ./Thunar-1.6.3 [Thunar-1.6.3/thunar/thunar-chooser-dialog.c:450]: (error) Uninitialized variable: app_info

Pretty small catch.

We connect the library:

 cppcheck -q -j4 --max-configs=1 --library=gtk --include=gtk.h ./Thunar-1.6.3

Command output

 [Thunar-1.6.3/plugins/thunar-sendto-email/main.c:410]: (error) Memory leak: tmpdir [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-provider.c:166]: (error) Memory leak: dialog [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:150]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:168]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:211]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:251]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:391]: (error) Memory leak: align [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:395]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:399]: (error) Memory leak: align [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:433]: (error) Memory leak: align [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:446]: (error) Memory leak: label [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-editor.c:458]: (error) Memory leak: align [Thunar-1.6.3/plugins/thunar-wallpaper/twp-provider.c:301]: (error) Memory leak: escaped_file_name [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-model.c:1520]: (error) Memory leak: command_line [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-model.c:1521]: (error) Memory leak: command_line [Thunar-1.6.3/plugins/thunar-uca/thunar-uca-model.c:1522]: (error) Memory leak: command_line [Thunar-1.6.3/thunar/thunar-column-editor.c:576]: (error) Memory leak: dialog [Thunar-1.6.3/thunar/thunar-chooser-dialog.c:450]: (error) Uninitialized variable: app_info [Thunar-1.6.3/thunar/thunar-device-monitor.c:829]: (error) Mismatching allocation and deallocation: devices [Thunar-1.6.3/thunar/thunar-folder.c:686]: (error) Mismatching allocation and deallocation: attrs [Thunar-1.6.3/thunar/thunar-file.c:1568]: (error) Mismatching allocation and deallocation: argv [Thunar-1.6.3/thunar/thunar-list-model.c:1084]: (error) Memory leak: old_order [Thunar-1.6.3/thunar/thunar-properties-dialog.c:446]: (error) Memory leak: spacer [Thunar-1.6.3/thunar/thunar-properties-dialog.c:534]: (error) Memory leak: spacer [Thunar-1.6.3/thunar/thunar-shortcuts-model.c:2210]: (error) Mismatching allocation and deallocation: bookmarks [Thunar-1.6.3/thunar/thunar-standard-view.c:1911]: (error) Returning/dereferencing 'file' after it is deallocated / released [Thunar-1.6.3/thunar/thunar-window.c:2028]: (error) Memory leak: checksum [Thunar-1.6.3/thunar/thunar-window.c:2028]: (error) Memory leak: tooltip

27 new bugs - something already!

Is it really a bug or a bunch of false positives?

A memory leak in the event of an error check is the most common type of memory leak:

  tmpdir = g_strdup ("/tmp/thunar-sendto-email.XXXXXX"); if (G_UNLIKELY (mkdtemp (tmpdir) == NULL)) { error = g_error_new_literal (G_FILE_ERROR, g_file_error_from_errno (errno), g_strerror (errno)); tse_error (error, _("Failed to create temporary directory")); g_error_free (error); return FALSE; } /* -   */ g_free(tmpdir);

Similar examples

  escaped_file_name = g_shell_quote (file_name); switch (desktop_type) { case DESKTOP_TYPE_XFCE: ... break; case DESKTOP_TYPE_NAUTILUS: ... break; default: return; /* , ?    break??? */ break; } g_free (escaped_file_name);

  GString *command_line = g_string_new (NULL); GList *lp; gchar *dirname; gchar *quoted; gchar *path; gchar *uri; g_return_val_if_fail (THUNAR_UCA_IS_MODEL (uca_model), FALSE); g_return_val_if_fail (iter->stamp == uca_model->stamp, FALSE); g_return_val_if_fail (error == NULL || *error == NULL, FALSE);

Constructor / destructor mismatch. In fact, this is a memory leak, since the internal structure of the array is not released. The code below will work fine without errors, but will flow in the elements of the array :

  gchar **attrs; ... attrs = g_file_info_list_attributes (info1, NULL); ... g_free (attrs);

Pointer returns after release:

  GtkTreePath *path = NULL; GtkTreeIter iter; ThunarFile *file = NULL; path = (*THUNAR_STANDARD_VIEW_GET_CLASS (standard_view)->get_path_at_pos) (standard_view, x, y); if (G_LIKELY (path != NULL)) { gtk_tree_model_get_iter (GTK_TREE_MODEL (standard_view->model), &iter, path); file = thunar_list_model_get_file (standard_view->model, &iter); if (!thunar_file_is_directory (file) && !thunar_file_is_executable (file)) { g_object_unref (G_OBJECT (file)); /*     ! */ gtk_tree_path_free (path); path = NULL; } } return file;

This is a controversial warning, since g_object_unref counts the links and whether the object will be deleted is unknown, but you should take a look at the error.

False alarms. There were such - some gtk functions allow you to alienate an object that will be destroyed automatically with the parent, if they are not excluded - cppcheck will swear.

Examples

 static void manage_actions (GtkWindow *window) { GtkWidget *dialog; dialog = g_object_new (THUNAR_UCA_TYPE_CHOOSER, NULL); gtk_window_set_transient_for (GTK_WINDOW (dialog), window); gtk_widget_show (dialog); }

  label = g_object_new (GTK_TYPE_LABEL, "label", _("Appears if selection contains:"), "xalign", 0.0f, NULL); gtk_table_attach (GTK_TABLE (table), label, 0, 2, 2, 3, GTK_EXPAND | GTK_FILL, GTK_FILL, 0, 0);

, , g_object :

  dialog = g_object_new (THUNAR_TYPE_COLUMN_EDITOR, NULL); ... gtk_widget_destroy (dialog);

, g_new :

  devices = g_new0 (gchar *, length + 2); ... g_strfreev (devices);

cppcheck . cppcheck:

  /* be sure to not overuse the stack */ if (G_LIKELY (length < 2000)) { old_order = g_newa (GSequenceIter *, length); new_order = g_newa (gint, length); } else { old_order = g_new (GSequenceIter *, length); new_order = g_new (gint, length); } ... /* clean up if we used the heap */ if (G_UNLIKELY (length >= 2000)) { g_free (old_order); g_free (new_order); }

But the beauty of these false positives is that they are easy to fix by patching the XML file with the library. Thus, it was possible to reduce the number of errors to 14. Over time, introspection in GLib will be improved and it will be possible to automatically gather information about the library.

We write rules for cppcheck

Finally - the most delicious. Here we will talk about how to implement your own checks for cppcheck. Surely, you have already found some kind of bug and would like to check the entire project for similar errors. Another situation is that you are a team leader and programmers on your team regularly model typical mistakes that you would like to stop in automatic mode. Keeping your own code in check is useful to develop efficient and secure programs. Found a mistake - we wrote to it not only the regression test, but also the rule of the analyzer.

What is under the hood?

First, a few words about how cppcheck works. Before directly analyzing, cppcheck runs the preprocessor, similarly to the compiler, followed by a step to simplify the source code. That is: all unnecessary indents and spaces are removed, each lexical construction of the language is separated by exactly one space. All constants that can be simplified during preprocessing are calculated. Everywhere {} blocks are placed, even if they are omitted. If there is a declaration or assignment of a variable inside an if / for / while block, it will be rendered outside this block.

Since the coding style is different for everyone, cppcheck aims to make the code be reduced to a kind of “normal form”. For example, one programmer writes a loop like this:

 for(int i = 0; i < 10; i++) if(i % 2) printf("%d\n", i);

And another - so:

 int i; for(i = 0; i < 10; i++) { if(i % 2) printf("%d\n", i); }

Cppcheck will bring this all to mind:

 int i ; for ( i = 0 ; i < 10 ; i ++ ) { if ( i % 2 ) { printf ( "%d\n" , i ) ; } }

Not very readable, but easy to analyze. All lexical structures are neatly separated by spaces and have nothing superfluous.

Cppcheck uses several levels of simplification: simple, normal, and original. For example, at the “simple” level, the sizeof operator is expanded as a number, while at the ordinary level it remains an operator. This is sometimes useful for finding errors related to a particular operator. The initial level is the code in its original form, on which the preprocessor has not yet worked and it is only brought to normal form.

Thus, cppcheck builds a model based on the source code (Tokenizer class), where all the tokens are simply separated by a space, allowing the analyzing modules to easily use these tokens. The analyzing module can, in turn, build its own model, if it needs it, navigate to tokens, determine the type of token, etc. At the moment there is a basic model (token splitting, it is used almost everywhere), ValueFlow - to check for leaks memory and experimental AST module (syntax tree). Policy makers can use any of these models.

In order not to wrestle with how cppcheck optimizes certain constructions, you can use the debug mode, in which cppcheck will display a simplified version of the code:

 cppcheck --debug ./file.cpp

Regular Expression Based Rules

Cppcheck allows you to expand your capabilities with regular expressions. The scheme of work is as follows: the source code is simplified, it is glued together in one line, after which a regular expression is applied to the resulting line. If a match is found, cppcheck will issue this warning and report a line of code that triggered the rule.

Before using this feature, you need to recompile cppcheck (which is why I recommend downloading the source version of the git), including experimental support for regular expressions. This will require the pcre library. Compiling everything is just as easy:

 make HAVE_RULES=yes

Now cppcheck will have two new parameters: --rule - you can set a regular expression directly on the command line and --rule-file - XML-base with your own validation rules.

Visually get acquainted with the new opportunity as follows. Test case:

 void f() { if(a) free(a); }

Next check. Create a regular expression, which presses under it all:

 cppcheck -q --rule=".*" test.c [test.c:1]: (style) found ' void f ( ) { if ( a ) { free ( a ) ; } }'

Fine!Now it is clear what kind is "simplified".

Now we’ll write a rule that allows you to search for unnecessary checks on a variable before freeing:

 cppcheck -q --rule='if \( (\w+) \) { free \( \1' test.c [test.c:2]: (style) found 'if ( a ) { free ( a'

The pattern works. It remains to write it to the database. Create a rules.xml file , finalizing the regular expression:

 <?xml version="1.0"?> <rule> <pattern>if \( (\b\w+\b) \) { (?:g_)?free \( \b\1\b \) ; }</pattern> <message> <severity>style</severity> <summary>Redundant condition. It is valid to free a NULL pointer.</summary> </message> </rule>

Now you can check some source code using this file as a rule base:

 cppcheck -q --rule-file=rules.xml test.c [test.c:2]: (style) Redundant condition. It is valid to free a NULL pointer.

The format of the XML file is quite obvious. Sometimes you may need another version of the simplified code, for example, raw - source code without simplifications. Then it’s enough to add a tag to the rule:

 <tokenlist>raw</tokenlist>

Inside the tokenlist tag you can use:

raw - code with minimal simplification
normal - the default mode
simple - the most simplified code
define - to check the preprocessor directives, cppcheck does not exclude macros from the source

If you delete the summary tag , you can see which line falls under a regular expression — such a debugging mode.

Some examples of rules:

Function without arguments, but without void

void:

 void f() {} /*  */ void f(void) {} /*  */

 <rule> <tokenlist>raw</tokenlist> <pattern>\( \) {</pattern> <message> <severity>style</severity> <summary>Always specify void even if a function accepts no arguments</summary> </message> </rule>

We catch increment / decrement inside sizeof

 <rule> <tokenlist>raw</tokenlist> <pattern>sizeof \( [^)]*(?:\w+ [+-]{2}|[+-]{2} \w+)[^)]* \)</pattern> <message> <severity>warning</severity> <summary>Operands to the sizeof operator should not contain side effects</summary> </message> </rule>

Do not use constants inside mktemp

 <rule> <pattern>mktemp \( "[^"]+" \)</pattern> <message> <severity>error</severity> <summary>The mktemp() function modifies its string argument</summary> </message> </rule>

In principle, it avoids the gets function.

 <rule> <pattern> gets \( \w+ \)</pattern> <message> <severity>error</severity> <summary>The gets() function is obsolescent, and is deprecated</summary> </message> </rule>

As you know, I'm allergic to XML, so I created a repository with a slightly simplified base format and compiler on Python. The XML database compiled by me contains the basic recommendations of CERT good-tune C programming (not C ++!) And is slowly growing. As they say, contibutions are welcome.

Of course, the rules on regular expressions are not perfect at all: fewer possibilities, it is impossible to distinguish a variable from a function or operator. But they are useful when written for a specific project with their own features and a database of errors. The rules can be used in conjunction with regression tests after detecting a bug manually, is it not enough, somewhere else in the code there is the same bug or it will appear in the future.

If you want to know more, there is some good documentation on the topic of creating rules for cppcheck: one , two , three (the third part is devoted to developing rules in C ++). You can greatly help the project by sending patches to developers, reports of false positives, bugs, and feel free to do PR (no, not one, but Pull Request :).

Source: https://habr.com/ru/post/210256/

All Articles