📜 ⬆️ ⬇️

Valgrind is good, but not enough

Not so long ago, we tried to demonstrate the benefits of using one of the companies' PVS-Studio static analyzer. Nothing efficient came of it. But in the course of the correspondence, I prepared a detailed answer regarding the methodologies of static and dynamic analysis. Now I decided to issue this answer in the form of a small article. I think the text may seem interesting to readers, and it will just be possible to use this article when communicating with new potential customers.

So, in the process of correspondence, a question was asked about the following:

We have already experimented with static analyzers and came to the conclusion that their accuracy is much worse than the usual valgrind. Therefore, it is not clear why static analysis is needed. There are a lot of false positives, and errors that valgrind doesn’t find are usually not found.

I prepared the following answer, which I cite by making only minor corrections:
')
Showing the benefits of static analysis on two small projects is quite difficult. First, they are quality. Secondly, static analysis is primarily focused on finding and eliminating errors in the new code. Third, the error density in small projects is lower than in large ones ( explanation ).

Trying to find something in a long and consistently working code is quite a thankless task. The meaning of static analysis is to prevent many mistakes at the earliest stages. Yes, most of these errors can be found by other methods. They will be noticed either by the programmer himself, or they will reveal big tests or testers. In the worst case, users will report errors. But in any case, it is wasted time. Many of the typos, Copy-Paste errors and other blunders can be eliminated even at the earliest stages with the help of static analysis. Finding many errors immediately after writing the code is its main value. Finding an error at any next stage costs many times more.

For some reason after this, everyone immediately says that our programmers do not make typos and Copy-Paste. It is not true. Do. Everyone does: http://www.viva64.com/ru/b/0260/

Well, let's say we convinced you that static analysis might find some kind of error. It is a fair question, but is it necessary, since there are tools like valgrind. After all, they really give less false positives.

Unfortunately, yes, we are. There is no one technology that will allow to detect errors of all types. Sadly, to improve the quality, you need to use different types of tools to complement each other.

We have already written about where static analysis helps other technologies. For example, we described the difference between static and dynamic code analysis in this note: http://www.viva64.com/ru/b/0248/

Another note on how static analysis complements testing with unit tests: http://www.viva64.com/ru/a/0080/

However, in order not to be completely abstract, let us try to explain the difference between static and dynamic analysis with examples. For example, let's select such an interesting fragment in the constructor of the SlowScanner class:
class SlowScanner { .... explicit SlowScanner(Fsm& fsm) { .... Fill(m_letters, m_letters + sizeof(m_letters)/sizeof(*m_letters), 0); .... } .... size_t* m_letters; .... } 

The PVS-Studio analyzer issues a warning: V514 Dividing sizeof a pointer 'sizeof (m_letters)' by another value. There is a possibility of logical error presence. slow.h 238

Most likely, once a member of the 'm_letters' class was an array with a fixed size. Of course, this is just a guess, but quite likely. For example, at the beginning it was written something like this: size_t m_letters [MAX_COUNT] ;. In those days, determining the size of the array was correct:
 sizeof(m_letters)/sizeof(*m_letters) 

Then the array became dynamic, and the 'm_letters' variable turned into a simple pointer. Now the expression "sizeof (m_letters) / sizeof (* m_letters)" is always equal to one. In a 32-bit system, the size of the pointer and the size_t type are 4. In a 64-bit system, the size of these types will be 8. However, regardless of whether we divide 4 by 4 or 8 by 8, we always get 1.

Thus, the Fill () function resets only one byte of the array. The error may not manifest itself at all if the memory accidentally turns out to be already zeroed out or if uninitialized elements are not used. This is her cunning. But reading an uninitialized array element may occur.

Can this error be found by the dynamic analyzer? I do not know. Perhaps he can detect reading from uninitialized memory. Then why is he silent? Here we come to one of the important differences between static and dynamic analysis.

Most likely, this code branch is rarely used, or at least it is not covered by tests. As a result, the dynamic analyzer simply does not check this code and does not see the error. The lack of dynamic analysis is that it is difficult to cover all possible code branching. As a result, rarely used code remains untested, and in particular various error handlers and non-standard situations.

Static analysis checks all branches that can theoretically get control. Therefore, it can find errors, regardless of how often a particular code is executed.

By the way, move away from the topic a bit. We can not only offer you our analyzer, but also code audit services. The result of this audit may be a document that includes a set of recommendations for improvements. It may well be included in the coding standard. We already have experience in performing such works. For example, in order to avoid errors associated with the calculation of the size of the array, we can recommend using a special technology (peeped at Chromium):
 template <typename T, size_t N> char (&ArraySizeHelper(T (&array)[N]))[N]; #define arraysize(array) (sizeof(ArraySizeHelper(array))) 

The macro 'arraysize' cannot be applied to an ordinary pointer. There will be a compilation error. Thus, we are protected from random errors. If suddenly the array turns into a pointer, you will not be able to miss the place where its size is calculated.

Let's return to static and dynamic analysis. For example, look at this function:
 inline RECODE_RESULT _rune2hex(wchar32 in, char* out, size_t out_size, size_t &out_writed) { static const char hex_digs[]="0123456789ABCDEF"; bool leading = true; out_writed = 0; RECODE_RESULT res = RECODE_OK; for (int i = 7; i >=0; i--){ unsigned char h = (unsigned char)(in>>(i*4) & 0x0F); if (h || !leading || i==0){ if (out_writed + 1 >= out_size){ res = RECODE_EOOUTPUT; break; } out[out_writed++] = hex_digs[h]; } } return res; } 

From the point of view of dynamic analysis, there is nothing suspicious here. In turn, the PVS-Studio static analyzer suggests paying attention to the 'leading' variable: V560 A part of the conditional expression is always false:! Leading. recyr_int.hh 220

I think there is no mistake. The variable 'leading' turned out to be superfluous after refactoring. What if not? Suddenly, the code is not added? This is the place to pay attention. And, if the variable is redundant, then remove it so that it does not confuse not only the analyzer, but also the people who will support this code.

Warnings that part of an expression is always a constant may seem uninteresting. Then look at the examples of errors found using the V560 diagnostics and you will be surprised what you will not find in the code: http://www.viva64.com/en/examples/V560/

Such errors cannot be found by dynamic analysis. He has nothing to look for here. These are just incorrect logical expressions.

Unfortunately, the proposed projects do not allow to fully demonstrate the advantages of a static analyzer. Referring to one of the libraries, which is part of the project. Indeed, in a sense, a mistake in the library is a mistake in the project itself.

Consider the function sslDeriveKeys, working with private data:
 int32 sslDeriveKeys(ssl_t *ssl) { .... unsigned char buf[SSL_MD5_HASH_SIZE + SSL_SHA1_HASH_SIZE]; .... memset(buf, 0x0, SSL_MD5_HASH_SIZE + SSL_SHA1_HASH_SIZE); psFree(ssl->sec.premaster); ssl->sec.premaster = NULL; ssl->sec.premasterSize = 0; skipPremaster: if (createKeyBlock(ssl, ssl->sec.clientRandom, ssl->sec.serverRandom, ssl->sec.masterSecret, SSL_HS_MASTER_SIZE) < 0) { matrixStrDebugMsg("Unable to create key block\n", NULL); return -1; } return SSL_HS_MASTER_SIZE; } 

The dynamic analyzer will not find anything here. The code from the point of view of language is absolutely correct. To find a mistake, you need to think of higher-level patterns that static analyzers can do.

We are interested in the local array 'buf'. Since it stores private data, at the end of the function, an attempt was made to reset this array using the memset () function. Therein lies the mistake.

See, after calling memset (), the local array 'buf' is no longer used. This means that the compiler has the right to remove the memset () function call, since its call has no effect from the point of view of C / C ++. Moreover, it is not only in the right, but also really removes this function in the release version.

As a result, private data will remain in the memory and theoretically can get to where they do not need to go. Thus, the project becomes a little less secure due to an error in a third-party library.

PVS-Studio will issue the following warning: V597 The compiler could delete the memset function call, which is used to flush the buf buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sslv3.c 123

This error is a potential vulnerability. It may seem that it is extremely insignificant. However, it can lead to very unpleasant consequences, even sending fragments of private data over the network. How such miracles can happen is described in the article by ABBYY specialist Dmitry Meshcheryakov: http://habrahabr.ru/company/abbyy/blog/127259/

Hopefully we managed to show the differences between the static and dynamic code analyzer. These two approaches complement each other well. The fact that static analysis produces a lot of false positives is not a problem. You can work with them and eliminate, configure the analyzer. In case of interest, we can do such work to reduce the number of warnings to a comfortable level.

If we are interested in you, we offer to outline the next steps for possible cooperation and demonstration of the analyzer's capabilities on live large real-world projects.

Source: https://habr.com/ru/post/236293/


All Articles