
As you know, our main activity is the development of the PVS-Studio code analyzer. And although we have been doing this successfully for a long time and, as it seems to us, we recently had an unusual thought. Still, we do not use our tools in the same way as our clients. No, of course, we check the PVS-Studio code with the help of PVS-Studio. But frankly, the PVS-Studio project is not that big. And working with the PVS-Studio code is different in style and character from, for example, working with the Chromium or LLVM code.
We wanted to be in the shoes of our clients in order to understand how our tool is used in long-term projects. After all, checking projects that we do regularly and about which we write a lot of
articles is just the style of using the analyzer that we actively oppose. It is wrong to launch a one-time analyzer on a project, correct several errors and repeat it in a year. When writing code, the analyzer should be used regularly, every day.
Well, okay, what's all this? Our theoretical desires to try ourselves in other projects coincided with practical proposals that gradually began to come to us. Last year we decided to single out in our company a team that would be engaged - oh, horror! - development to order. That is, I participated in third-party projects as programmers. Moreover, it was interesting for us to participate in long-term and rather large projects, i.e. at least 2-3 developers and at least 6 months of development. We had two goals:
- try an alternative type of business (custom development in addition to product development);
- see for yourself how to use PVS-Studio in long-term projects.
Both the first and second tasks were successful. But this article is not about custom development business, but about our experience. This is not an organizational experience. About this and so many articles. Pro experience with the code of other projects. About this we want to tell.
Seen interesting moments
Third-party projects teach us a lot. I think we will dedicate more than one article to this topic. And now we start with some interesting observations. We noticed 3 bright features of the code, which showed themselves in large and old projects. We are sure that over time there will be publications about other observations.
')
We have quite a lot of
articles devoted to changing the platform capacity from 32-bit to 64-bit. A lot of errors are connected with porting, which start to manifest themselves in 64-bit programs. We call them
64-bit errors .
But besides switching to 64-bit systems, there were other changes in the code that are not so noticeable. This happened in connection with the development of compilers, libraries and the maturing of the projects themselves. The consequences of these changes are well marked in the code, which has a long history of development. We offer to discuss them. I think it will be interesting and useful. Perhaps someone will want to review their old code to identify similar problems.
The considered error patterns were noticed by us, thanks to the PVS-Studio analyzer. Many of these errors are hidden. The code almost works due to lucky coincidence. However, each such error is a small time bomb that can work at any time.
Note. To avoid NDA restrictions, we changed the names and edited the code. Actually, this is not the code that was in the program. However, "everything is based on real events."
Change in the behavior of the new operator
Long ago, the 'new' operator returned NULL in case of a memory allocation error. Then the compilers began to support the modern C ++ standard, and throwing exceptions to std :: bad_alloc in such situations. You can force the 'new' operator to return NULL, but this is not the case now.
Apparently, these changes were noticed very superficially by the community of programmers. We took note of this fact and began to write code taking into account the new behavior. There are, of course, programmers who still do not know that the 'new' operator throws an exception. But we are talking about normal, adequate programmers. Individuals who do not want to know anything and continue to write in style, as they did 20 years ago, are not interested in us.
However, even those who know that the 'new' operator has changed its behavior have not reviewed the existing code. Someone just did not realize it. Someone wanted, but he had no time, and then he forgot about it. As a result, incorrect handlers of situations continue to live in a huge number of programs when memory cannot be allocated.
Some code snippets are harmless:
int *x = new int[N]; if (!x) throw MyMemoryException();
In the first case, instead of MyMemoryException, an exception std :: bad_alloc will be generated. Most likely, these exceptions are handled in the same way. No problem.
In the second case, the check will not work. The function will not return the value 'false'. Instead, an exception will be thrown, which will somehow be handled later. This is mistake. The behavior of the program has changed. But in practice, this almost never leads to problems. Simply, the program will react a little differently to the lack of memory.
It is more important to warn about cases where the behavior of the program does not change very slightly, but quite significantly. Such situations in large old projects can be found a huge amount.
Here are a couple of examples where certain actions must be performed when there is a shortage of memory:
image->SetBuffer(new unsigned char[size]); if (!image->GetBuffer()) { CloseImage(image); return NULL; } FormatAttribute *pAttrib = new FormatAttribute(sName, value, false); if (pAttrib ) { m_attributes.Insert(pAttrib); } else { TDocument* pDoc = m_state.GetDocument(); if (pDoc) pDoc->SetErrorState(OUT_OF_MEMORY_ERROR, true); }
This code is much more dangerous. For example, a user may lose the contents of his document in case of insufficient memory, although it was quite possible to give the opportunity to save the document to a file.
Recommendation. Find in your program all the 'new' statements. Check if the program is going to take some action if the pointer is zero. Rewrite such places.
The PVS-Studio analyzer helps to detect meaningless checks using the
V668 diagnostics.
Replacing char * with std :: string
With the transition from C to C ++ and with the growing popularity of the standard library, string classes such as std :: string have become widely used in programs. This is understandable and explainable. It is easier and safer to work with a full line, and not with the "char *" pointer.
The class of lines began to be used not only in the new code, but also to change some old fragments. This is exactly what troubles may arise. It is enough to weaken attention, and the code becomes dangerous or completely incorrect.
But let's not start with the terrible, but rather amusing. Sometimes there are such inefficient cycles:
for (i = 0; i < strlen(str.c_str()); ++i)
Obviously, once the variable 'str' was an ordinary pointer. It is bad to call the strlen () function at each iteration of the loop. This is extremely inefficient on long lines. However, after turning the pointer into std :: string, the code began to look even more stupid.
It is immediately obvious that the replacement of types occurred thoughtlessly. This can lead to such inefficient code or errors in general. Let's talk about errors:
wstring databaseName; .... wprintf("Error: cannot open database %s", databaseName);
In the first case, the pointer 'wchar_t *' turned into 'wstring'. The trouble will arise if you can not open the database, and you will need to print a message. The code is compiled, but the screen will be printed abracadabra. Or the program will fall at all. The reason is that you forgot to add a call to the c_str () function. Correct option:
wprintf("Error: cannot open database %s", databaseName.c_str());
The second case is generally epic. As it is not surprising, but such code is successfully compiled. A very popular string class CString is used. The CString class can be implicitly converted to a pointer. That is what is happening here. The result is a double buffer removal.
Recommendation. If you change pointers to a string class, do it carefully. Do not use bulk replacement without viewing each case. The fact that the code is compiled does not mean that it works. It is better to leave the code using pointers alone, unless there is a clear need to edit it. Let the code work better with pointers than incorrectly with classes.
The PVS-Studio analyzer helps to detect some of the errors that occurred due to the replacement of pointers to classes. The diagnostics of
V510 ,
V515 , etc. can help in this. But you should not completely rely on the analyzers. An extremely creative code can be caught, in which not only the analyzer, but also an experienced programmer can hardly find an error.
Replacing char with wchar_t
The project is developing, there is a desire to make the application interface multilingual. And at some point there is a massive replacement of char by wchar_t. Fixed errors issued by the compiler. "Ready" unicode version of the application.
In practice, after such a replacement, the application turns into a sieve. Errors that arise after such a replacement can manifest themselves for decades and are difficult to reproduce.
How can this be? Very simple. Many code snippets are not ready for the character size to change. The code is compiled without errors and without warnings. But it works only on "50%". Now you understand what we mean.
Note. Let me remind you that we do not intimidate the bad code received from students. We talk about the harsh realities of programmer life. In large and old projects, such errors inevitably appear. And there are hundreds of them. Seriously. Hundreds.
Examples:
wchar_t tmpBuffer[500]; memset(tmpBuffer, 0, 500);
In the first case, the programmer did not think at all that the size of the symbol would change with time. Therefore cleared only half of the buffer. That's where my words about 50%.
In the second case, the programmer guessed that the symbol size would change. However, the hint "* sizeof (char)" did not help with the mass replacement of types. It was necessary to do wrong. The correct option is:
memset(charArray, 0, MAX_MSG_LENGTH * sizeof(charArray[0]));
The third example. Types are fine. To calculate the length of a string, use the _tcslen () function. At first glance, everything is fine. However, when the characters began to have the type 'wchar_t', the program again began to work at 50%.
Allocated 2 times less memory than necessary. In fact, the program will work successfully until the message is longer than half the maximum possible length. An unpleasant error that lingered in the code for a long time.
Fourth example. Corrected functions printf (), sprintf () and so on, but forgot to check frintf (). As a result, garbage is written to the file. It was necessary to do something like this:
fwprintf(fp, L"%i : %s", i, name);
Or so, if it is an ordinary ASCII file:
fprintf(fp, "%i : %S", i, name);
Note. Now I thought that I was talking about 50%, from the point of view of Windows. In Linux, the wchar_t type occupies not 2, but 4 bytes. So in the Linux world, the program will work only by 25% at all :).
Recommendation. If you have already replaced char with wchar_t en masse, we don’t know what can help. This could make a lot of interesting mistakes. Carefully review all the places where wcahr_t is used is not realistic. In part, you will be helped by a static code analyzer. Part of the mistakes he will reveal The errors shown above will be found by the PVS-Studio analyzer. The consequences of replacing char with wchar_t are quite diverse and are revealed by various diagnostics. Enumerate them does not make sense. As an example, you can call
V512 ,
V576 ,
V635 , etc.
If you haven’t done a replacement yet, but you are planning to, then approach it seriously. Replacing types automatically and correct compilation errors will take, for example, 2 days. Making the same replacements manually, with viewing the code, will take an order of magnitude longer. For example, you will spend on it 2 weeks. But it is better than to catch all the new errors after 2 years:
- incorrect formatting of strings;
- exceeding the bounds of allocated memory buffers;
- working with buffers cleared in half;
- work with buffers copied in half;
- and so on.
Conclusion
We were pleased with our experience of participating in third-party custom development, so it allowed us to look at the world with different eyes. We will continue our participation in third-party projects as developers, so if anyone is interested in chatting with us on this topic, please write. If you are afraid that later there will be an exposing article about you (about the errors found), then anyway,
write - so help us all with the NDA!