How to reduce the likelihood of errors at the stage of writing code. Note N1

I got to the code of the well-known instant messaging client Miranda IM . Together with various plugins, this is a fairly large project, the size of which is about 950 thousand lines of code in C and C ++. And, as in any solid project with a history of development, it contains a considerable number of errors and typos.

Considering defects in various applications, I noticed some patterns. And now, using the example of defects found in Miranda IM, I will try to formulate some recommendations that will allow you to avoid many errors and typos at the stage of writing code.

')
For the analysis of Miranda IM, I used (yes, you guessed it) the PVS-Studio 4.14 analyzer. The code of the project Miranda IM is very high quality, as evidenced by the popularity of this program. I myself am a user of this client and have no complaints about it in terms of quality. The project is collected in Visual Studio with Warning Level 3 (/ W3), and the number of comments is 20% of the entire program text.

1. Avoid memset, memcpy, ZeroMemory and similar functions.

We will start with errors that occur when using low-level memory functions such as memset, memcpy, ZeroMemory, and the like.

I recommend to avoid these functions in every possible way. It is clear that you should not literally follow my advice and replace all these functions with cycles. But I have seen such a huge number of errors associated with the use of these functions, that I highly recommend to be careful with them and use them only when necessary. In my opinion there are only two cases where the use of these functions is justified:

1) Processing large arrays. That is, where the optimized algorithm of the function will give a gain compared with the processing of elements in a simple cycle.

2) Processing a large number of small arrays. It also makes sense in terms of performance improvement.

In all other cases, it is better to try to do without them. For example, I consider these functions superfluous in a program such as Miranda. There are no resource-intensive algorithms or huge arrays in it. Consequently, the use of memset / memcpy functions derives only from the convenience of writing short code. However, this simplicity is very deceptive and, having saved a few seconds on writing code, you can catch for weeks a vaguely manifested memory corruption error. Consider a few code examples taken from the Miranda IM project.

V512 A call of the memcpy function will lead to a buffer overflow or underflow. tabsrmm utils.cpp 1080

  typedef struct _textrangew
 {
   CHARRANGE chrg;
   LPWSTR lpstrText;
 } TEXTRANGEW;

 const wchar_t * Utils :: extractURLFromRichEdit (...)
 {
   ...
   :: CopyMemory (tr.lpstrText, L "mailto:", 7);
   ...
 }

Copy only a piece of string. The error is trivial to a disgrace, but this does not cease to be an error. Most likely, a string consisting of 'char' was used here before. Then we switched to Unicode strings, but forgot to change the constant.

If you initially use to copy the string functions that are intended for this, then such an error simply can not occur. Imagine what it was written like this:

  strncpy (tr.lpstrText, "mailto:", 7);

Then, when moving to Unicode strings, the number 7 does not need to be changed:

  wcsncpy (tr.lpstrText, L "mailto:", 7);

I'm not saying that this is the perfect code. But it is already much better than using CopyMemory. Consider another example.

V568 It's odd that the sizeof () operator is the '& ImgIndex' expression. clist_modern modern_extraimage.cpp 302

  void ExtraImage_SetAllExtraIcons (HWND hwndList, HANDLE hContact)
 {
   ...
   char * (ImgIndex [64]);
   ...
   memset (& ImgIndex, 0, sizeof (& ImgIndex));
   ...
 }

Here I wanted to reset the array consisting of 64 pointers. But instead, we will reset only the first element. The same error, by the way, is present again in another file. Thanks to Copy-Paste:

V568 It's odd that the sizeof () operator is the '& ImgIndex' expression. clist_mw extraimage.c 295

The correct version of the code should look like this:

  memset (& ImgIndex, 0, sizeof (ImgIndex));

By the way, taking an address from an array can only further confuse the person who is reading the code. Here taking an address has no effect. And the code can be written like this:

  memset (ImgIndex, 0, sizeof (ImgIndex));

The following example.

V568 It's odd that the sizeof () operator is the '& rowOptTA' expression. clist_modern modern_rowtemplateopt.cpp 258

  static ROWCELL * rowOptTA [100];

 void rowOptAddContainer (HWND htree, HTREEITEM hti)
 {
   ...
   ZeroMemory (rowOptTA, sizeof (& rowOptTA));
   ...
 }

Again, instead of calculating the size of the array, we calculate the size of the pointer. The correct expression is "sizeof (rowOptTA)". To clean the array, I suggest the following version of this code:

  const size_t ArraySize = 100;
 static ROWCELL * rowOptTA [ArraySize];
 ...
 std :: fill (rowOptTA, rowOptTA + ArraySize, nullptr);

I have already got used to the fact that such strings like to multiply according to the copy program:

V568 It's odd that the sizeof () operator is the '& rowOptTA' expression. clist_modern modern_rowtemplateopt.cpp 308

V568 It's odd that the sizeof () operator is the '& rowOptTA' expression. clist_modern modern_rowtemplateopt.cpp 438

Do you think that this is all about low-level work with arrays? No, not all. Look further, be afraid and beat the hands of fans to write memset.

V512 A call of the memset function will lead to a buffer overflow or underflow. clist_modern modern_image_array.cpp 59

  static BOOL ImageArray_Alloc (LP_IMAGE_ARRAY_DATA iad, int size)
 {
   ...
   memset (& iad-> nodes [iad-> nodes_allocated_size], 
     (size_grow - iad-> nodes_allocated_size) *
        sizeof (IMAGE_ARRAY_DATA_NODE),
     0);
   ...
 }

This time the size of the copied data is calculated correctly. Here are the second and third arguments that are confused in places. As a result, we fill 0 items. The correct option is:

  memset (& iad-> nodes [iad-> nodes_allocated_size], 0,
   (size_grow - iad-> nodes_allocated_size) *
      sizeof (IMAGE_ARRAY_DATA_NODE));

How to rewrite this code snippet more gracefully, I do not know. Rather, it cannot be rewritten beautifully, without affecting other parts of the program and data structures.

The question arises, how can we do without memset when working with such structures as OPENFILENAME:

  OPENFILENAME x;
 memset (& x, 0, sizeof (x));

Very simple. You can create a zeroed structure like this:

  OPENFILENAME x = {0};

2. Watch carefully whether you work with a signed or unsigned type.

The problem of confusion between signed and unsigned types may seem far-fetched at first glance. But this question is in vain underestimated by programmers.

In most cases, people do not like to consider the compiler warnings related to the fact that a variable of type int is compared with a variable of type unsigned. Indeed, most often such code is completely correct. Programmers disable such warnings or do not pay attention to them. Or there is a third option, they enter an explicit type conversion to silence the compiler warning, without going into details.

I suggest not doing this anymore and analyzing the situation each time when the sign type meets with the unsigned one. And, in general, be attentive to what type the expression has or what the function returns. Now a few examples on the subject.

V547 Expression 'wParam> = 0' is always true. Unsigned type value is always> = 0. clist_mw cluiframes.c 3140

The program code has a function id2pos, which returns the value '-1' in case of an error. This function is all right. Elsewhere, the result of the id2pos function is used as follows:

  typedef UINT_PTR WPARAM; 
 static int id2pos (int id);
 static int nFramescount = 0;

 INT_PTR CLUIFrameSetFloat (WPARAM wParam, LPARAM lParam)
 {
   ...
   wParam = id2pos (wParam);
   if (wParam> = 0 && (int) wParam <nFramescount)
     if (Frames [wParam] .floating)
   ...
 }

The problem is that the wParam variable is unsigned. Therefore, the condition 'wParam> = 0' is always true. If the id2pos function returns '-1', then the condition for checking invalid values does not work, and we will start using a negative index.

I'm pretty sure that the following code was written at first:

if (wParam> = 0 && wParam <nFramescount)

The Visual C ++ compiler issued a warning warning C4018: '<': signed / unsigned mismatch. This warning is included on Warning Level 3, with which Miranda IM is going. And at this moment insufficient attention was paid to this place. The warning was suppressed by an explicit type conversion. But the error did not disappear, but only hid. The correct code should be the following code:

if ((int) wParam> = 0 && (int) wParam <nFramescount)

So attention and more attention to such places. In Miranda IM, I counted 33 conditions that, because of the confusion with signed / unsigned, are always true or always false.

We continue. I especially like the following example. And the comment is just beautiful.

V547 Expression 'nOldLength <0' is always false. Unsigned type value is never <0. IRC mstring.h 229

  void Append (PCXSTR pszSrc, int nLength)
 {
   ...
   UINT nOldLength = GetLength ();
   if (nOldLength <0)
   {
     // protects from underflow
     nOldLength = 0;
   }
   ...
 }

I think no explanation for this code is required.

Of course, programmers are not always to blame for mistakes. Sometimes the developers of libraries (in this case WinAPI) also enclose the pig.

  #define SRMSGSET_LIMITNAMESLEN_MIN 0
 static INT_PTR CALLBACK DlgProcTabsOptions (...)
 {
   ...
   limitLength =
     GetDlgItemInt (hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE)> =
     SRMSGSET_LIMITNAMESLEN_MIN?
     GetDlgItemInt (hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE):
     SRMSGSET_LIMITNAMESLEN_MIN;
   ...
 }

If you do not pay attention to unnecessarily complex expression, the code looks correct. By the way, it was generally one line. For convenience, I broke it into several. However, the formatting of the code now we will not affect.

The problem is that the GetDlgItemInt () function does not return an 'int' at all, as the programmer expected. This function returns a UINT. Here is its prototype from the “WinUser.h” file:

  WINUSERAPI
 Uint
 Winapi
 GetDlgItemInt (
     __in HWND hDlg,
     __in int nIDDlgItem,
     __out_opt BOOL * lpTranslated,
     __in BOOL bSigned);

As a result, PVS-Studio displays the message:

V547 Expression is always true. Unsigned type value is always> = 0. scriver msgoptions.c 458

And indeed it is. The expression "GetDlgItemInt (hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE)> = SRMSGSET_LIMITNAMESLEN_MIN" is always true.

Specifically, in this case, perhaps, there will be no mistake. But the meaning of my warning, I hope, is clear. Look carefully at what functions return to you.

3. Avoid a lot of calculations in one line.

Every programmer knows and responsibly declares in discussions that it is necessary to write simple and clear code. But in practice, it sometimes seems that he is participating in a secret contest who will write a string more cunningly, using an interesting construction of the language or demonstrate the ability to juggle with pointers.

Most often, errors occur where the programmer, in order to create a compact code, collects several actions into one line at a time. With a slight gain in elegance, it significantly increases the risk of a typo or that it does not take into account side effects. Consider an example:

V567 Undefined behavior. The variable is modified while it is being used. msn ezxml.c 371

  short ezxml_internal_dtd (ezxml_root_t root, char * s, size_t len)
 {
   ...
   while (* (n = ++ s + strspn (s, EZXML_WS)) && * n! = '>') {
   ...
 }

This is where undefined behavior occurs. This code can function correctly for a long period of program life. But there is no guarantee that it will also behave after changing the version of the compiler or optimization keys. The compiler is right to first compute '++ s' and then call the function 'strspn (s, EZXML_WS)'. Or, on the contrary, at first it can call a function, and only then increase the value of the variable 's'.

I will give another example, why you should not try to collect everything in one line. In Miranda IM, some branches of the program execution are disabled / enabled using inserts of the form '&& 0'. Examples:

  if ((1 || altDraw) && ...
 if (g_CluiData.bCurrentAlpha == GoalAlpha && 0)
 if (checkboxWidth && (subindex == - 1 || 1)) {

With the given comparisons, everything is clear and well marked. Now imagine that you meet the fragment shown below. I formatted the code. The program is one line.

V560 A part of the conditional expression is always false: 0. clist_modern modern_clui.cpp 2979

  LRESULT CLUI :: OnDrawItem (UINT msg, WPARAM wParam, LPARAM lParam)
 {
   ...
   DrawState (dis-> hDC, NULL, NULL, (LPARAM) hIcon, 0,
     dis-> rcItem.right + dis-> rcItem.left-
     GetSystemMetrics (SM_CXSMICON)) / 2 + dx,
     (dis-> rcItem.bottom + dis-> rcItem.top-
     GetSystemMetrics (SM_CYSMICON)) / 2 + dx,
     0,0,
     DST_ICON |
     (dis-> itemState & ODS_INACTIVE && FALSE? DSS_DISABLED: DSS_NORMAL));
    ...
 }

If there is no error here, it is still not easy to remember and find the word FALSE in this line. Did you find him? Agree - not an easy task. And if this is a mistake? Then there is no chance at all to find it, just looking through the code. Such expressions are very useful to make separately. Example:

  UINT uFlags = DST_ICON;
 uFlags | = dis-> itemState & ODS_INACTIVE && FALSE?
             DSS_DISABLED: DSS_NORMAL;

And I myself, perhaps, would have written it in a long, but more understandable way:

  UINT uFlags;
 if (dis-> itemState & ODS_INACTIVE && (((FALSE))))
   uFlags = DST_ICON |  DSS_DISABLED;
 else 
   uFlags = DST_ICON |  DSS_NORMAL;

Yes, it is longer, but easy to read, and the word FALSE is better noticeable.

4. Align all that is possible in the code

Alignment of the code reduces the likelihood of typo or errors during Copy-Paste. If the error is still made, then finding it in the process of reviewing the code will be several times easier. Consider the sample code.

V537 Consider reviewing the correctness of 'maxX' item's usage. clist_modern modern_skinengine.cpp 2898

  static BOOL ske_DrawTextEffect (...)
 {
   ...
   minX = max (0, minX + mcLeftStart-2);
   minY = max (0, minY + mcTopStart-2);
   maxX = min ((int) width, maxX + mcRightEnd-1);
   maxY = min ((int) height, maxX + mcBottomEnd-1);
   ...
 }

Monolithic code snippet, which is not interesting to read. Format it:

  minX = max (0, minX + mcLeftStart - 2);
 minY = max (0, minY + mcTopStart - 2);
 maxX = min ((int) width, maxX + mcRightEnd - 1);
 maxY = min ((int) height, maxX + mcBottomEnd - 1);

This is not the most illustrative example, but you must admit, now it has become much easier to notice that the maxX variable is used twice.

My alignment recommendation should not be taken literally and everywhere to build columns of code. Firstly, it requires additional time when writing and editing code. Secondly, it can lead to other errors. In the following example, just the desire to make a beautiful column led to an error in the Miranda IM code.

V536 Be advised that the utilized form. Oct: 037, Dec: 31. msn msn_mime.cpp 192

  static const struct _tag_cpltbl
 {
   unsigned cp;
   const char * mimecp;
 } cptbl [] =
 {
   {037, "IBM037"}, // IBM EBCDIC US-Canada 
   {437, "IBM437"}, // OEM United States 
   {500, "IBM500"}, // IBM EBCDIC International 
   {708, "ASMO-708"}, // Arabic (ASMO 708) 
   ...
 }

Wanting to make a beautiful column of numbers, it is easy to get carried away and enter '0' at the beginning, making the constant octal.

I specify the recommendation. Align in the code everything that is possible. But do not align the numbers by appending zeros.

5. Do not multiply the string more than once.

Copying lines while programming is inevitable. But you can secure yourself without pasting a string from the clipboard several times at once. In most cases, it is better to copy the string, then edit it. Copy and edit again. And so on. So it’s harder to forget to change something in the string or change it incorrectly. Consider the sample code:

V525 The code containing the collection of similar blocks. Check items '1316', '1319', '1318', '1323', '1323', '1317', '1321' in lines 954, 955, 956, 957, 958, 959, 960. clist_modern modern_clcopts.cpp 954

  static INT_PTR CALLBACK DlgProcTrayOpts (...)
 {
   ...
   EnableWindow (GetDlgItem (hwndDlg, IDC_PRIMARYSTATUS), TRUE);
   EnableWindow (GetDlgItem (hwndDlg, IDC_CYCLETIMESPIN), FALSE);
   EnableWindow (GetDlgItem (hwndDlg, IDC_CYCLETIME), FALSE);    
   EnableWindow (GetDlgItem (hwndDlg, IDC_ALWAYSPRIMARY), FALSE);
   EnableWindow (GetDlgItem (hwndDlg, IDC_ALWAYSPRIMARY), FALSE);
   EnableWindow (GetDlgItem (hwndDlg, IDC_CYCLE), FALSE);
   EnableWindow (GetDlgItem (hwndDlg, IDC_MULTITRAY), FALSE);
   ...
 }

Most likely, there is no real error here. Just work twice with the IDC_ALWAYSPRIMARY element. However, it is very easy to make mistakes in such blocks of copied lines.

6. Expose a high level of warnings to the compiler and use static analyzers.

For many errors it is impossible to give recommendations on how to reduce the likelihood of their occurrence in the code. Most often, they are typos that both the novice and the most highly qualified programmer allow.

However, many of these errors can be found even at the stage of writing code. First of all, these are compiler warnings. And in the second - reports from the night run static code analyzers.

Now someone will say that this is a poorly hidden advertisement. But in fact, this is just another recommendation that reduces the number of errors. If I found errors using static analysis, and I cannot say how to avoid their appearance in the code, then the recommendation is to use code analyzers.

Consider some examples of errors that can be quickly identified by source code analyzers:

V560 A part of the conditional expression is always true: 0x01000. tabsrmm tools.cpp 1023

  #define GC_UNICODE 0x01000

 DWORD dwFlags;

 UINT CreateGCMenu (...)
 {
   ...
   if (iIndex == 1 && si-> iType! = GCW_SERVER &&
       ! (si-> dwFlags && GC_UNICODE)) {
   ...
 }

Mistakened. Instead of the '&' operator, the '&&' operator is used. How to err when writing code, I do not know. The correct version of the condition:

  (si-> dwFlags & GC_UNICODE)

The following example.

V528 It is odd that pointer to 'char' type is compared with the '\ 0' value. Probably meant: * str! = '\ 0'. clist_modern modern_skinbutton.cpp 282

V528 It is odd that pointer to 'char' type is compared with the '\ 0' value. Probably meant: * endstr! = '\ 0'. clist_modern modern_skinbutton.cpp 283

  static char * _skipblank (char * str)
 {
   char * endstr = str + strlen (str);
   while ((* str == '' || * str == '\ t') && str! = '\ 0') str ++;
   while ((* endstr == '' || * endstr == '\ t') &&
          endstr! = '\ 0' && endstr <str)
     endstr--;
   ...
 }

In this code, only two asterisks '*' are forgotten for pointer dereferencing. The result can be fatal. This code is predisposed to access violation. The correct code is:

  while ((* str == '' || * str == '\ t') && * str! = '\ 0') str ++;
 while ((* endstr == '' || * endstr == '\ t') &&
        * endstr! = '\ 0' && endstr <str)
   endstr--;

Here again it is difficult to give advice, apart from using special tools for checking the code.

The following example.

V514 Dividing sizeof a pointer 'sizeof (text)' by another value. There is a possibility of logical error presence. clist_modern modern_cachefuncs.cpp 567

  #define SIZEOF (X) (sizeof (X) / sizeof (X [0]))

 int Cache_GetLineText (..., LPTSTR text, int text_size, ...)
 {
   ...
   tmi.printDateTime (pdnce-> hTimeZone, _T ("t"), text, SIZEOF (text), 0);
   ...
 }

At first glance, everything is fine. The text and its length, calculated using the SIZEOF macro, is transferred to the function. In fact, the macro should be called COUNT_OF, but that's not the point. The trouble is that we are trying to count the number of characters in the index. Here, “sizeof (LPTSTR) / sizeof (TCHAR)” is calculated. A person notices such suspicious places badly, but the compiler and the static analyzer are good. Corrected code:

  tmi.printDateTime (pdnce-> hTimeZone, _T ("t"), text, text_size, 0);

Following example

V560 A part of the conditional expression is always true: 0x29. icqoscar8 fam_03buddy.cpp 632

  void CIcqProto :: handleUserOffline (BYTE * buf, WORD wLen)
 {
   ...
   else if (wTLVType = 0x29 && wTLVLen == sizeof (DWORD))
   ...
 }

Here the recommendation is appropriate to write a constant in the condition in the first place. Here such code simply will not compile:

  if (0x29 = wTLVType && sizeof (DWORD) == wTLVLen)

But many programmers, and I among them, do not like this style. For example, it prevents me from reading the code, because first I want to know which variable is being compared, and only then with what exactly.

If the programmer refuses this style of comparisons, then he is left with either to rely on the compiler / analyzer, or to take risks.

By the way, despite the fact that everyone knows about this error, it is not the rarest. Three more examples from Miranda IM, where the PVS-Studio analyzer issued a warning V559 :

  else if (ft-> ft_magic = FT_MAGIC_OSCAR)
 if (ret = 0) {return (0);}
 if (Drawing-> type = CLCIT_CONTACT)

Analysis of the code also allows you to identify if not errors, then very suspicious places in the code. For example, in Miranda IM, pointers are used not only as pointers. If in some places of the code such games look normal, in others they are scary. An example of code that is extremely alarming to me:

V542 Consider inspecting an odd type cast: 'char *' to 'char'. clist_modern modern_toolbar.cpp 586

  static void
 sttRegisterToolBarButton (..., char * pszButtonName, ...)
 {
   ...
   if ((BYTE) pszButtonName)
     tbb.tbbFlags = TBBF_FLEXSIZESEPARATOR;
   else
     tbb.tbbFlags = TBBF_ISSEPARATOR;
   ...
 }

In fact, we check that the address of the string is not a multiple of 256. I do not understand what we really wanted to write in the condition. Perhaps this is even correct, but very doubtful.

By analyzing the code, you can find a lot of incorrect conditions. Example:

V501 There are identical sub-expressions 'user-> statusMessage' operator. jabber jabber_chat.cpp 214

  void CJabberProto :: GcLogShowInformation (...)
 {
   ...
   if (user-> statusMessage && user-> statusMessage)
   ...
 }

And so on and so forth. I can give other examples for a long time, but it does not make sense. It is important that a large number of errors can be detected using static analysis at the earliest stages.

When a static analyzer finds few errors in your program, its use does not seem interesting. But this is not the right way to reason. After all, many of the errors that the analyzer could find in the early stages were corrected by sweat and blood, debugging for hours.

Static analysis is of more interest in the program development process, and not as one-time checks. Many errors and typos are in the process of testing and creating unit tests. But if some of these errors can be found at the stage of writing the code, it will be a huge gain of time and effort. It's a shame to debug the program for two hours, in order to later notice an extra semicolon ';' after the 'for' operator. This error can often be neutralized by spending 10 minutes on a static analysis of files that have been modified during the operation.

Conclusion

In this article, I shared only a few thoughts on how to make less mistakes when programming in C ++. Other thoughts are about to ripen, which I will try to write about in the following articles and notes.

PS

It has already become a tradition that after a similar article, someone asks if you have informed the developers of the program / library about errors found. I will answer in advance the question if we sent a bug report on the Miranda IM project.

No, not sent. This is too resource-intensive task. The article shows only a small part of what was found. There are about a hundred fragments in the project, where the code should be corrected, and more than a hundred, where I just don’t know if this is a mistake or not. However, a translation of this article will be sent to the authors of Miranda IM and they will be offered a free version of the PVS-Studio analyzer. If they show interest, they will be able to check the source code themselves and correct what they see fit.

It is also worth explaining why I often do not venture to judge the error of a particular piece of code or not. An example of an ambiguous code:

V523 The 'then' statement is equivalent to the 'else' statement. scriver msglog.c 695

  if (streamData-> isFirst) {
   if (event-> dwFlags & IEEDF_RTL) {
     AppendToBuffer (& buffer, & bufferEnd, & bufferAlloced, "\\ rtlpar");
   } else {
     AppendToBuffer (& buffer, & bufferEnd, & bufferAlloced, "\\ ltrpar");
   }
 } else {
   if (event-> dwFlags & IEEDF_RTL) {
     AppendToBuffer (& buffer, & bufferEnd, & bufferAlloced, "\\ rtlpar");
   } else {
     AppendToBuffer (& buffer, & bufferEnd, & bufferAlloced, "\\ ltrpar");
   }
 }

Here are two identical code snippets. Perhaps this is a mistake. And perhaps now in any branch need the same set of actions. And the code is written specifically so that later it was easy to modify. Here it is necessary to know the program in order to sort out this place erroneously or not.

Source: https://habr.com/ru/post/115143/

All Articles