How PVS-Studio looks for errors: methods and technologies

PVS-Studio is a static source code analyzer for searching errors and vulnerabilities in C, C ++ and C # programs. In this article I want to give an overview of the technologies that we use in the PVS-Studio analyzer to detect errors in the program code. In addition to general theoretical information, I will use practical examples to show how this or that technology allows detecting errors.

Introduction

The reason for writing this article was my presentation at the open conference ISPAS 2016 (ISPRAS OPEN 2016), held in early December in the Main building of the Russian Academy of Sciences. The topic of the report: “Principles of operation of the PVS-Studio static code analyzer” ( presentation in pptx format ).

Unfortunately, the presentation time was very limited, so I had to prepare a very short presentation and not tell a lot of what I wanted in the report. Therefore, I decided to write this article, where I will talk in more detail about what approaches and algorithms we use when developing the PVS-Studio project.

At the moment, PVS-Studio is, in fact, two separate analyzers: one for C ++, the other for C #. Moreover, they are written in different languages. We develop the C ++ analyzer core in C ++, and the C # analyzer core in C #.
')
However, developing these two cores, we use the same approach. Moreover, a number of employees simultaneously participate in the development of both C ++ and C # diagnostics. Therefore, later in the article I will not share these analyzers. The description of the mechanisms will be common to both analyzers. Yes, of course there are some differences, but for sight-seeing, they are insignificant. If the need arises in the process of narration, I will clearly indicate whether it is a C ++ analyzer, or C #.

Team

Before proceeding to the description of the analyzer, I will say a few words about our company and our team.

The PVS-Studio analyzer is developed in the Russian company OOO Program Verification. The company develops on its own funds received from the sales of PVS-Studio. The company's office is located in the city of Tula, located 200 km. from Moscow.

Website: http://www.viva64.com

At the time of this writing, the company employs 24 people.

PVS-Studio command

Some people think that such a product as a code analyzer can be made by one person. However, this is a big job that requires a lot of person-years. And even more person-years are required in order to maintain and develop it.

We see our mission in popularizing the static code analysis methodology. And of course, to earn money by developing a powerful tool to identify as many errors as possible at the very early stages of application development.

Our achievements

To popularize PVS-Studio, we regularly check various open projects and describe the errors found in them in the articles . At the moment we have checked about 270 projects.

In the process of writing these articles, we have identified more than 10,000 errors that were reported to the authors of the projects. We are very proud of this and now I will explain why.

If you divide the number of errors found by the number of projects, you get a not very impressive number: about 40 errors per project. Therefore, I want to highlight an important point. These 10,000 errors are a side effect. We never set out to reveal as many errors as possible. Often we stop when we have found enough defects in a project to write an article.

This very well demonstrates the convenience and capabilities of the analyzer. We are proud that you can just take unfamiliar projects and almost immediately find any errors in them. If this were not the case, we would not have revealed 10,000 errors simply as a side effect of writing articles to the blog.

PVS-Studio

In short, PVS-Studio is:

More than 340 diagnostics for C, C ++;
Over 120 diagnostics for C #;
Windows;
Linux;
Plugin for Visual Studio;
Quick start (compilation monitoring);
Various support features, such as integration with SonarQube and IncrediBuild.

Why C and C ++

C and C ++ languages are extremely efficient and elegant. But in return, they require incredible attention from the programmer and deep knowledge of the subject area. Therefore, static code analyzers have long been well established among C and C ++ developers. Moreover, although languages, compilers and development tools are developing, but, as they say, nothing changes. Now I will explain with an example what I mean.

C ++ & C #

By the 30th anniversary of C ++, we tested the first compiler of the Cfront language, written in 1985. Who is interested in the details, I suggest to read the article " On the thirtieth anniversary of the first C ++ compiler: looking for errors in Cfront ".

We found the following error in it:

Pexpr expr::typ(Ptable tbl) { .... Pclass cl; .... cl = (Pclass) nn->tp; cl->permanent=1; // <= use if (cl == 0) error('i',"%k %s'sT missing",CLASS,s); // <= test ....

First, the pointer cl is dereferenced, and only then it is checked for NULL equality.

It's been 30 years.

Now the code before us is not a Cfront compiler, but a modern Clang. And this is what PVS-Studio detects in it:

 .... Value *StrippedPtr = PtrOp->stripPointerCasts(); PointerType *StrippedPtrTy = dyn_cast<PointerType>(StrippedPtr->getType()); // <= use if (!StrippedPtr) // <= test return 0; ....

As the saying goes: “Bugs. C ++ bugs never change. ” The StrippedPtr pointer is first dereferenced, and only then it is checked for NULL equality.

Code analyzers are extremely useful for C and C ++ languages. Therefore, we have developed and will continue to develop the PVS-Studio analyzer for these languages. It is not expected that these tools will have less work, as languages are extremely popular and at the same time extremely dangerous.

Why C #

Of course, in some moments the C # language is more perfect and secure than C ++. However, he did not manage to go far, and he also delivers a lot of headaches to programmers. I will confine myself to only one of the examples, but in general this is a topic for a separate article.

C #, Facepalm

We again meet an old friend - the error described above. Fragment from the PowerShell project:

 .... _parameters = new Dictionary<string, ParameterMetadata>( other.Parameters.Count, // <= use StringComparer.OrdinalIgnoreCase); if (other.Parameters != null) // <= test ....

First, the other.Parameters link is used to get the Count property, and only then it is checked for null equality.

As you can see, the fact that in C # pointers were called links was not better. As for the various typos, they do not depend on the language at all. In general, there is work for PVS-Studio and we are actively developing C # -direction.

What's next?

So far we have no clear plans, what language we want to support next. We have two candidates: Objective-C and Java. We are more inclined to the Java language, but so far we have not finally decided.

What's next?

What technologies we do not use in PVS-Studio

Before talking about the internal structure of PVS-Studio, I’ll briefly note what is not in PVS-Studio.

PVS-Studio has nothing to do with the Prototype Verification System ( PVS ). This is just a coincidence. The abbreviation PVS is derived from the name of our company Program Verification Systems.

PVS-Studio does not directly use the mathematical tools of grammar to find errors. The analyzer operates at a higher level. The analysis is performed on the basis of the parse tree .

PVS-Studio does not use the Clang compiler to analyze C / C ++ code. Clang is used to perform the preprocessing step. You can learn more about this in the article " A little about the interaction of PVS-Studio and Clang ". To build the parse tree, we use our own parser, which was based on the now forgotten OpenC ++ library. However, almost nothing is left of the code of that library, and we will implement support for new language constructs on our own.

When working with C # code, we rely on Roslyn . C # PVS-Studio analyzer checks the source code of the program directly, which improves the analysis accuracy compared to checking the binary byte-code (Common Intermediate Lanuage).

PVS-Studio does not use string matching and regular expressions for finding errors. This is a dead end road. This approach has so many flaws that it is impossible to make at least a qualitative analyzer on its basis, and many diagnostics are in principle impossible to implement. This topic is discussed in more detail in my article " Static Analysis and Regular Expressions ".

What technologies do we use in PVS-Studio

To ensure the high quality of the static analysis results, we use advanced methods for analyzing the source code of a program and its control flow graph. Let's take a look at them.

Note. Further, as examples, some diagnostics will be considered and the principles of their operation are briefly described. It is important to note that I deliberately omit the description of cases when diagnostics should not work in order not to overload the article with details. I write this note for those who have not encountered the development of analyzers: do not think that everything is as simple as it will be written below. Making diagnostics is only 5% of work. Swearing on a suspicious code is not difficult, it is much more difficult not to swear at the correct code. 95% of the time it takes to develop a diagnostic program to train the analyzer to highlight various programming techniques that, although they look suspicious for diagnostics, are in fact correct.

Pattern-based analysis

It is used to search for places in the source code that are similar to known code templates with an error. There are a lot of patterns, and the complexity of their identification is extremely different. Moreover, some diagnostics for detecting typos resort to empirical algorithms.

Pattern-based analysis

To begin with, let's look at the two most simple cases that can be identified using pattern analysis. The first simple case:

 if ((*path)[0]->e->dest->loop_father != path->last()->e->....) { delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;); } else { delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;); }

PVS-Studio warning: V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa-threadupdate.c 2596

Regardless of the condition, the same set of actions is always performed. I think everything is so simple here that no special explanation is required. By the way, I met this code fragment not in the student’s term paper, but in the GCC compiler code. The results of the GCC compiler check can be found in the article " Finding errors in the GCC compiler code using the PVS-Studio analyzer ".

The second simple case (code taken from the FCEUX project):

 if((t=(char *)realloc(next->name,strlen(name+1))))

PVS-Studio warning: V518 The 'realloc' function allocates a strange amount of memory calculated by 'strlen (expr)'. Perhaps the correct variant is 'strlen (expr) + 1'. fceux cheat.cpp 609

The following error pattern is analyzed. Programmers know that when they allocate memory for storing a string, they must additionally allocate memory for one character where the end-of-line ( terminal zero ) sign will be stored. In other words, programmers know that they must add +1 or + sizeof (TCHAR). But they do it sometimes casually. As a result, they add 1 not to the value that the strlen function returns, but to the pointer.

That is exactly what happened in our case. Instead of strlen (name + 1) should be written strlen (name) +1 .

Because of this error, the memory is allocated slightly less than required. Next, a buffer overflow will occur and the consequences will be unpredictable. Moreover, the program can pretend that it works correctly if, thanks to luck, two bytes after the allocated buffer are not used. In an even worse scenario, such a defect may produce induced errors that will manifest themselves in a completely different place.

Now consider the analysis of the average level of complexity .

Diagnostics is formulated as follows: warned if after using the as operator for null the original object is checked instead of the result of the as operator.

Consider a code snippet from the CodeContracts project:

 public override Predicate JoinWith(Predicate other) { var right = other as PredicateNullness; if (other != null) { if (this.value == right.value) {

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'right'. CallerInvariant.cs 189

Note that the variable other is checked for equality null , and not at all right . This is an obvious mistake, because then it works with the right variable.

And in the end - a complex pattern associated with the use of macros.

The macro expands so that the priority of the operation inside the macro is higher than the priority of the operation outside the macro. Example:

 #define RShift(a) a >> 3 .... RShift(a & 0xFFF) // a & 0xFFF >> 3

To solve the problem, you need to put the argument a in brackets in brackets (and better, the whole macro is also in brackets), that is, write like this:

 #define RShift(a) ((a) >> 3),

Then the macro will unfold correctly in:

 RShift(a & 0xFFF) // ((a & 0xFFF) >> 3)

Description of the pattern looks simple, but in practice the implementation of the diagnosis is very complicated. It is not enough to analyze only "#define RShift (a) a >> 3". If you issue warnings on all such lines, there will be too many positives. It is necessary to look at how the macro is revealed in a particular case, and try to separate situations when it is a special idea, and when there really is not enough brackets.

Consider this error on the example code of a real FreeBSD project:

 #define ICB2400_VPINFO_PORT_OFF(chan) \ (ICB2400_VPINFO_OFF + \ sizeof (isp_icb_2400_vpinfo_t) + \ (chan * ICB2400_VPOPT_WRITE_SIZE)) .... off += ICB2400_VPINFO_PORT_OFF(chan - 1);

PVS-Studio warning: V733 Check expression: chan - 1 * 20. isp.c 2301

Type inference

Type inference based on the semantic model of the program allows the analyzer to have complete information about all variables and expressions found in the code.

Inheritance

In other words, the analyzer must know whether the token Foo is a variable name, a class name or a function. The analyzer largely repeats the work of the compiler, which also needs to know exactly the type of an object and all the accompanying information about the type: size, signed / unsigned type, if a class, then from whom it is inherited, and so on.

It is for this reason that the PVS-Studio analyzer requires you to preprocess * .c / *. Cpp files. Only by analyzing the preprocessed file can we collect all the information about the types. Without such information, many diagnostics can not be carried out, or they will give a lot of false positives.

Note. If someone declares that their analyzer is able to check * .c / *. Cpp files as a text document, without full preprocessing, then you should know, this is just self-indulgence. Yes, such an analyzer can find something, but in general it is a frivolous toy.

So, information about types is needed both for detecting errors and, in order, on the contrary, not to issue false warnings. Especially important information about the classes.

Let's take a look at examples of how type information is used.

The first example demonstrates that type information is needed to detect an error when working with the fprintf function (the code is taken from the Cocos2d-x project):

 WCHAR *gai_strerrorW(int ecode); .... #define gai_strerror gai_strerrorW .... fprintf(stderr, "net_listen error for %s: %s", serv, gai_strerror(n));

PVS-Studio warning: V576 Incorrect format. Consider checking the fourth argument of the fprintf function. The type of symbols is expected. ccconsole.cpp 341

The frintf function expects a pointer of type char * as the fourth argument. By chance it turned out that the actual argument is a string of type wchar_t * .

To identify this error, you need to know the type that gai_strerrorW returns . If this information is not available, then it is not possible to identify the error.

Now consider an example where knowledge of type information prevents a false warning from being issued.

Code of the form "* A = * A;" definitely considered a suspicious analyzer. However, the analyzer will keep silent if it encounters the following situation:

 volatile char *ptr; .... *ptr = *ptr; // <=   V570

The volatile qualifier suggests that this is not a mistake, but a special idea of the programmer. For some reason, the developer needs to “touch” a memory cell. Why does he need it - we do not know, but if he does that, then it makes sense and you should not issue a warning.

Now let's look at an example of how to detect an error based on knowledge of the class.

The sample code is taken from the CoreCLR project:

 struct GCStatistics : public StatisticsBase { .... virtual void Initialize(); virtual void DisplayAndUpdate(); .... GCStatistics g_LastGCStatistics; .... memcpy(&g_LastGCStatistics, this, sizeof(g_LastGCStatistics));

PVS-Studio Warning: V598 The 'memcpy' function is used to copy the fields of the 'GCStatistics' class. Virtual table pointer will be maintained by this. cee_wks gc.cpp 287.

Copying one object to another using the memcpy function is quite acceptable if the objects are POD structures. However, there are virtual methods in the class here, which means there is also a pointer to a table of virtual methods. Copying this pointer from one object to another is extremely dangerous.

So, diagnostics became possible due to the fact that we know that the g_LastGCStatistics variable is an instance of a class, and that this class is not a POD type.

Symbolic execution

Symbolic execution allows you to calculate the values of variables that can lead to errors, to check the ranges (range checking) of values. In our articles, we sometimes call this the mechanism for calculating virtual values: see, for example, the article " Finding errors by calculating virtual values ."

Symbolic execution

Knowing the estimated values of the variables, you can identify errors such as:

memory leaks;
overflow;
out of bounds array;
dereferencing null pointers in C ++ / accessing null links in C #;
meaningless conditions;
division by 0;
and so on.

Consider how using knowledge of the possible values of variables you can find various errors. Let's start with the code snippet taken from the QuantLib project:

 Handle<YieldTermStructure> md0Yts() { double q6mh[] = { 0.0001,0.0001,0.0001,0.0003,0.00055,0.0009,0.0014,0.0019, 0.0025,0.0031,0.00325,0.00313,0.0031,0.00307,0.00309, ........................................................ 0.02336,0.02407,0.0245 }; // 60  .... for(int i=0;i<10+18+37;i++) { // i < 65 q6m.push_back( boost::shared_ptr<Quote>(new SimpleQuote(q6mh[i])));

PVS-Studio warning: V557 Array overrun is possible. The value of 'i' index could reach 64. markovfunctional.cpp 176

Here the analyzer knows the following data:

q6mh array contains 60 elements;
the array counter i will take the values [0..64].

Knowing this data, the V557 diagnostics detects an out of bounds of the array when performing the operation q6mh [i] .

Now consider a situation where division by 0 may occur. The code is taken from the Thunderbird project:

 static inline size_t UnboxedTypeSize(JSValueType type) { switch (type) { ....... default: return 0; } } Minstruction *loadUnboxedProperty(size_t offset, ....) { size_t index = offset / UnboxedTypeSize(unboxedType);

PVS-Studio warning: V609 Divide by zero. Denominator range [0..8]. ionbuilder.cpp 10922

The UnboxedTypeSize function returns various values, including 0. Without checking that the result of the function can be 0, it is used as a denominator. This could potentially lead to the division of the variable offset by 0.

The previous examples dealt with a range of integer values. However, the analyzer operates with values of other data types, for example, with strings and pointers.

Consider an example of incorrect handling of strings. In this case, the analyzer stores information that the entire string has been converted to upper or lower case. This allows you to identify the following situations:

 string lowerValue = value.ToLower(); .... bool insensitiveOverride = lowerValue == lowerValue.ToUpper();

PVS-Studio warning: V3122 The 'lowerValue' lowercase string is compared with the 'lowerValue.ToUpper ()' uppercase string. ServerModeCore.cs 2208

The programmer wanted to check that all characters in the string are capitalized. The code clearly contains some kind of logical error, since previously all the characters of this string were converted to lowercase.

It is possible to continue describing diagnostics based on knowledge of the meaning of variables for a long time. I will give just one more example related to pointers and memory leaks.

The code is taken from the WinMerge project:

 CMainFrame* pMainFrame = new CMainFrame; if (!pMainFrame->LoadFrame(IDR_MAINFRAME)) { if (hMutex) { ReleaseMutex(hMutex); CloseHandle(hMutex); } return FALSE; } m_pMainWnd = pMainFrame;

Analyzer Warning: V773 The function was exited without releasing the 'pMainFrame' pointer. A memory leak is possible. Merge merge.cpp 353

If the frame could not be loaded, the function ends its work. This does not destroy the object, the pointer to which is stored in the variable pMainFrame .

Diagnostics works as follows. The analyzer remembers that the pMainFrame pointer stores the address of an object created with the new operator. Analyzing the control flow graph, the analyzer encounters a return statement . At the same time, the object was not destroyed and the pointer continues to refer to the created object. This means that a memory leak occurs at this point.

Method annotations

Annotation of methods provides more information about the methods used than can be obtained by analyzing only their signatures.

memcmp ()

We did a lot of work annotating the functions:

C / C ++. At the moment, 6570 functions have been annotated (standard C and C ++ libraries, POSIX, MFC, Qt, ZLib, and so on).
C #. 920 functions have been annotated at the moment.

Consider how the memcmp function is annotated in the C ++ kernel of the analyzer:

 C_"int memcmp(const void *buf1, const void *buf2, size_t count);" ADD(REENTERABLE | RET_USE | F_MEMCMP | STRCMP | HARD_TEST | INT_STATUS, nullptr, nullptr, "memcmp", POINTER_1, POINTER_2, BYTE_COUNT);

Brief explanations on the markup:

C_ - ancillary control mechanism for annotations (unit tests);
REENTERABLE - a repeated call with the same arguments will give the same result;
RET_USE - the result should be used;
F_MEMCMP — run certain checks of the buffer overflow;
STR_CMP - if equal, the function returns 0;
HARD_TEST is a special function: some libraries define their own identical functions in their namespace and therefore the namespace should be ignored;
INT_STATUS - the result cannot be explicitly compared with 1 or -1;
POINTER_1, POINTER_2 - pointers must be non-zero and different;
BYTE_COUNT - the parameter sets the number of bytes and must be greater than 0.

These annotations are used by many diagnostics. Consider some of the errors that we found in the application code due to the above markup for the memcmp function.

An example of using the INT_STATUS markup. CoreCLR project:

 bool operator()(const GUID& _Key1, const GUID& _Key2) const { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }

V698 Expression 'memcmp (....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp (....) <0' instead. sos util.cpp 142

Such code may work, but in general it is incorrect. The memcmp function returns values of 0, greater than zero, and less than zero. Important:

"More than zero" is not necessary 1
"Less than zero", this is not necessarily -1

Thus, there is no guarantee in the performance of the written code. At any time, the comparison may start to work incorrectly. This can happen when changing the compiler, changing the optimization settings and so on.

The INT_STATUS flag helps to reveal another type of error. Firebird project code:

 SSHORT TextType::compare(ULONG len1, const UCHAR* str1, ULONG len2, const UCHAR* str2) { .... SSHORT cmp = memcmp(str1, str2, MIN(len1, len2)); if (cmp == 0) cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0)); return cmp; }

PVS-Studio. V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. Breaking the program's logic. texttype.cpp 3

Again carelessly working with the result, which returns the function memcmp . The error is that the size of the type is truncated: the result is placed in a variable of type short .

Some may think that we quibble. Not at all. Such inaccurate code can easily become the cause of the real vulnerability.

One such error caused a serious vulnerability in MySQL / MariaDB to versions 5.1.61, 5.2.11, 5.3.5, 5.5.22. The reason was the following code in the 'sql / password.c' file:

 typedef char my_bool; .... my_bool check(...) { return memcmp(...); }

The bottom line is that when the MySQL / MariaDB user connects, the token is computed (SHA for password and hash), which is compared with the expected value of the memcmp function. On some platforms, the return value may fall out of range [-128..127]. As a result, in 1 case out of 256, the procedure of comparing the hash with the expected value always returns true , regardless of the hash.As a result, a simple bash command gives the attacker root access to the vulnerable MySQL server, even if he does not know the password. A more detailed description of this problem can be found here: Security vulnerability in MySQL / MariaDB .

An example of using the mark BYTE_COUNT . Project GLG3D:

 bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } .... }

PVS-Studio warning: V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269

The third argument to the memcmp function is marked as BYTE_COUNT . It is considered that such an argument should not be equal to 0. In the given code fragment, the third actual parameter is just 0. The

error is that a bracket is not put there. As a result, the third argument is the expression sizeof (Matrix4) == 0 . The result of this expression is false , i.e. 0.

An example of using the markup POINTER_1 and POINTER_2 . GDB Project:

 static int psymbol_compare (const void *addr1, const void *addr2, int length) { struct partial_symbol *sym1 = (struct partial_symbol *) addr1; struct partial_symbol *sym2 = (struct partial_symbol *) addr2; return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value, sizeof (sym1->ginfo.value)) == 0 && .......

PVS-Studio warning: V549 The first argument of the memcmp function is equal to the second argument. psymtab.c 1580

The first and second arguments are marked as PONTER_1 and POINTER_2. First, it means that they should not be NULL. But in this case, we are interested in the second markup property: these pointers should not be the same, as indicated by the suffixes _1 and _2.

Because of a typo in the code, the & sym1-> ginfo.value buffer is compared to itself. PVS-Studio, based on the markup, easily detects this error.

An example of using the F_MEMCMP markup .

This markup includes a number of special diagnostics for functions such as memcmp and __builtin_memcmp. As a result, such an error in the Haiku project can be identified:

 dst_s_read_private_key_file(....) { .... if (memcmp(in_buff, "Private-key-format: v", 20) != 0) goto fail; .... }

PVS-Studio warning: V512 A call-out for the 'memcmp' function. "Private-key-format: v". dst_api.c 858

The string “Private-key-format: v” consists of 21 characters, not 20 characters. Thus, less bytes are compared than required.

An example of using the markup REENTERABLE . To be honest, the word “reenterable” does not quite reflect the essence of this flag. However, all the developers in our team are used to it and do not want to make changes in the code for the sake of beauty.

The essence of the markup is as follows. The function has no state and no side effects: does not change memory, does not print something on the screen, does not delete files on the disk. Due to this, the analyzer can distinguish the correct constructions from the wrong ones. For example, this code is quite legal:

 if (fprintf(f, "1") == 1 && fprintf(f, "1") == 1)

The analyzer will not issue warnings. We write two units to a file and the code cannot be reduced to:

 if (fprintf(f, "1") == 1) //

But such a code is redundant and the analyzer will be alerted, since the cosf function has no state and does not record anything anywhere:

 if (cosf(a) > 0.1f && cosf(a) > 0.1f)

Let us now return to the memcmp function and see what error we managed to detect using the considered markup in the PHP project:

 if ((len == 4) /* sizeof (none|auto|pass) */ && (!memcmp("pass", charset_hint, 4) || !memcmp("auto", charset_hint, 4) || !memcmp("auto", charset_hint, 4)))

PVS-Studio warning: V501 There are identical sub-expressions! Memcmp ("auto", charset_hint, 4) operator.html.c 396 It is

checked twice that the buffer contains the word “auto”. This code is redundant and the analyzer assumes that it contains an error. Indeed, the comment tells us that the comparison with the string “none” is missing.

As you can see, using markup, you can find a lot of interesting errors. Often, analyzers provide opportunities for users to annotate functions on their own. In PVS-Studio, these capabilities are underdeveloped. There are only a few diagnostics in it, for which something can be annotated. For example, this is the V576 diagnostics for finding errors using formatted output functions (printf, sprintf, wprintf, and so on).

We deliberately do not develop the mechanism of user annotations. There are two reasons for this:

In a large project, no one will waste time marking up functions. It is simply unrealistic when you have 10 million lines of code, and the PVS-Studio analyzer is focused on medium and large projects.
If functions from some well-known library are not marked up, then it is better to write to us and we ourselves will annotate them. Firstly, we will do it faster and better, and secondly, the markup results will be available to all our users.

One more time about technology.

Briefly summarize my story about the technologies used. PVS-Studio uses:

Pattern-based analysis based on an abstract syntax tree: Used to find places in the source code that are similar to known code patterns with an error.
(type inference) : , .
(symbolic execution): , , (range checking) .
(data-flow analysis): , . , if/else.
(method annotations): , .

Based on these technologies, the analyzer can detect the following classes of errors in C, C ++ and C # programs:

64-bit errors;
the address of a local variable is returned from the function by reference;
arithmetic overflow, underflow;
out of bounds array;
dual release of resources;
dead code;
micro-optimization;
unreachable code;
uninitialized variables;
unused variables;
incorrect shift operations;
unspecified / unspecified behavior;
incorrect work with types (HRESULT, BSTR, BOOL, VARIANT_BOOL);
misconception about the function / class;
typos;
lack of a virtual destructor;
the design of the code does not coincide with the logic of its work;
errors due to copy-paste;
errors with exceptions;
buffer overflow;
safety issues;
confusion with priority operations;
null pointer / null reference dereference;
dereference of parameters without prior verification;
synchronization errors;
errors when using WPF;
memory leaks;
integer division by 0;
diagnostics created by special requests of users.

Conclusion. PVS-Studio analyzer is a powerful error-finding tool that uses a modern arsenal of methods to detect them.

PVS-Studio is a positive superhero

Yes, PVS-Studio is the positive superhero of the software world.

Testing PVS-Studio

Development of code analyzers is impossible without their constant thorough testing. When developing PVS-Studio, we use 7 different testing methods:

. PVS-Studio. . C++ # .
. , . PVS-Studio C# C++ . , Clang C++ .
- , , . , - . .
. -.
, , .
(projects and solutions). . , , - . . C++ 120 Windows (Visual C++), 24 Linux (GCC). C# . 54 .
— , Visual Studio.

Conclusion

This article is written to popularize static analysis methodology. I think readers are interested to know not only about the results of using code analyzers, but also how they are arranged inside. I will try from time to time to write articles on this topic.

Additionally, we plan to participate more in various events, such as conferences and seminars. We will be happy to receive invitations to various events, especially those taking place in Moscow and St. Petersburg. For example, in your institute or company there are meetings of programmers where people share their experience. We can come and make a report on an interesting topic. For example, about modern C ++, about how we develop analyzers, about typical errors of programmers and how to prevent them, refining the coding standard and so on. I ask you to send invitations to me by mailing karpov [@] viva64.com.

Finally, some links:

If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. How PVS-Studio does the bug search: methods and technologies .

Read the article and have a question?

Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/319382/

All Articles

How PVS-Studio looks for errors: methods and technologies

Introduction

Team

Our achievements

PVS-Studio

Why C and C ++

Why C #

What's next?

What technologies we do not use in PVS-Studio

What technologies do we use in PVS-Studio

Pattern-based analysis

Type inference

Symbolic execution

Method annotations

One more time about technology.

Testing PVS-Studio

Conclusion

More articles: