⬆️ ⬇️

Static analysis of the source code on the example of WinMerge

Today I want to dedicate a post to the topic, why source code analysis tools are useful regardless of the level of knowledge and experience of the programmer. And the benefit of such an analysis will be demonstrated by the example of a tool that is known to all programmers - WinMerge.





The earlier the error in the application code is detected, the cheaper it is to fix it. Hence the conclusion that the cheapest and simplest error can be eliminated in the process of writing code. And even better, if the error is not written at all. I just wanted to make a mistake, so immediately clap your hands and the code is already written correctly. But somehow it does not work. The “need to write without errors” approach still does not work.



Even a highly skilled programmer who is not in a hurry, makes mistakes, starting from the simplest typos and ending with logical errors in algorithms. This is where the law of large numbers works. It seems that it is impossible to make a mistake in each specific “if” statement. And wrote 200 comparisons, and once, yes was wrong. Andrei Urazov is interested in this in his report “Quality-Oriented Programming” at the CodeFest 2010 conference ( see the recording of the speech ). Briefly, I want to give his next thought. No matter how experienced developers are, errors still appear in the code. These errors cannot be stopped doing. But many of them can be successfully fought at a much earlier stage than is usually done.

')

Usually, the very first level of error defense is the creation of unit tests on newly written code. Sometimes tests are written before the code that they will check. However, unit tests have a number of drawbacks, which I will not discuss in detail, since they are already known to programmers. It is not always easy to create a unit test for a function that requires a lot of preliminary data preparation. Unit tests become a burden if project requirements change quickly. Tests take a lot of time to write and maintain. Tests are not always easy to cover all the branches. And you can also receive as a gift a monolithic project in which unit tests simply do not exist and were not planned. Without denying the enormous usefulness of unit tests, I believe that although this is a good defensive line, it can and should be significantly strengthened.



Often, programmers neglect an even earlier level of defense — static code analysis. Many use the ability to analyze the code, without going beyond the diagnostic warnings issued by compilers. Meanwhile, there is a whole class of tools that make it possible to identify at the coding stage a significant percentage of logical errors and simple typos. These tools perform higher-level code checking, relying on the knowledge of some coding patterns, use heuristic algorithms, and have a flexible configuration system.



Static analysis, of course, also has flaws. He simply cannot detect many kinds of errors. Analyzers give false positives and force you to make such adjustments in the code so that they like this code and then be rated as safe.



But there are huge benefits. The analysis covers all branches of the program, regardless of the frequency of their use. The analysis does not depend on the stage of execution. You can check even unfinished code. You can check a large amount of code inherited by you. Static analysis is fast and scales well in contrast to dynamic validation tools.



Sounded a lot of words about static source code analysis. Now it's time to pay attention to practice. I'll take one application written in C ++ and try to find errors in it.



I wanted to choose something small and well-known. Since I do not use so many tools, then scrolling through the list of programs in the Start menu, my choice was WinMerge. WinMerge is available in source code , small (about 186,000 lines). The program is of sufficient quality. I speak on the basis of the fact that I use it without any complaints, and that 25% of the source code is occupied by comments (a good sign). In general, the optimal choice.



The latest available version 2.13.20 (dated 10/20/2010) was downloaded. For the analysis, I used the prototype of a general-purpose analyzer that we are developing. I will dwell on this in a bit more detail.



Now, the PVS-Studio static analyzer includes two sets of rules. One is designed to detect 64-bit defects , and the other is to check OpenMP programs . We are currently developing a general rule set. Even the beta version is not yet available, but something is already working and I really want a little bit of a real war with errors. We plan to make a new set of rules for free, so please do not write about self-promotion. To the public, we will present a new tool in 1-2 months within PVS-Studio 4.00.



So, here are some interesting points that I found in the WinMerge-2.13.20 source code within half an hour (15 minutes test, 15 minutes review of the results). There are other suspicious places, but it takes effort to figure out if there really is a mistake or not. Now I have no task to find as many defects as possible in one project. I would like to gracefully show the usefulness of static analysis and how even a quick study can quickly reveal a number of errors.



Example one. The analyzer pointed me to several errors “V530 - The return value of the function 'Foo' is required to be utilized”. Typically, these warnings occur due to improper use of functions. Consider the code snippet:

  / **
 * @brief get for each item.
 * @note Return empty strings if item is special item.
 * /
 void CDirView :: GetItemFileNames (int sel,
   String & strLeft, String & strRight) const
 {
   UINT_PTR diffpos = GetItemKey (sel);
   if (diffpos == (UINT_PTR) SPECIAL_ITEM_POS)
   {
     strLeft.empty ();
     strRight.empty ();
   }
   else
   {
      ...
   }
 } 


The function should return two blank lines in a certain situation. However, due to inattention, instead of std :: string :: clear (), the functions std :: string :: empty () are called. This, by the way, is not such a rare mistake as it may seem. I met her in many other projects. This includes the other WinMerge functions:

  / **
 * @brief Clear variant's value (reset to defaults).
 * /
 void VariantValue :: Clear ()
 {
   m_vtype = VT_NULL;
   m_bvalue = false;
   m_ivalue = 0;
   m_fvalue = 0;
   m_svalue.empty ();
   m_tvalue = 0;
 } 


Here again, the expected row cleaning does not occur.



But the warning “V501 - There are identical sub-expressions” was triggered. operator ":

  BUFFERTYPE m_nBufferType [2];
 ...
 // Handle unnamed buffers
 if ((m_nBufferType [nBuffer] == BUFFER_UNNAMED) ||
     (m_nBufferType [nBuffer] == BUFFER_UNNAMED))
   nSaveErrorCode = SAVE_NO_FILENAME; 


If you look at the code next, then by analogy it should be written here:

  (m_nBufferType [0] == BUFFER_UNNAMED) ||
 (m_nBufferType [1] == BUFFER_UNNAMED) 


And if not, then there is still some kind of error.



If various emergencies occur, WinMerge will try to report errors, but in many cases it will fail. By the way, this is a good example of how a code analyzer can detect errors in rarely used program sections. There are several errors in the code, about which PVS-Studio reports like this: "V510 - The 'Format' is not expected to receive class-type variable as 'N' actual argument". Code example:

  String GetSysError (int nerr);
 ...
 CString msg;
 msg.Format (
 _T ("Failed to open registry key HKCU /% s: \ n \ t% d:% s"),
 f_RegDir, retVal, GetSysError (retVal)); 


At first glance, everything is fine. That's just the type "String" is nothing more than "std :: wstring". And consequently, at best, we will print abracadabra, and at worst, there will be a memory access error (Access Violation). Instead of a pointer to a string, an object of the type “std :: wstring” is placed. I described this situation in more detail in the article “ Big brother helps you ”. A valid code must contain a call to c_str ():

  msg.Format (
 _T ("Failed to open registry key HKCU /% s: \ n \ t% d:% s"),
 f_RegDir, retVal, GetSysError (retVal) .c_str ()); 


Let's go further. Here is a very suspicious code. There is a mistake here or not, I do not know. But it is strange that the two branches of the “if” operator contain completely identical code. The analyzer warned about this situation with the diagnostic message "V532 - The 'then' statement is equivalent to the 'else' statement". Here it is this suspicious code:

  if (max <INT_MAX)
 {
   for (i = min; i <max; i ++)
   {
     if (eptr> = md-> end_subject ||
         IS_NEWLINE (eptr))
       break;
     eptr ++;
     while (eptr <md-> end_subject &&
            (* eptr & 0xc0) == 0x80)
       eptr ++;
     }
   }
 else
 {
   for (i = min; i <max; i ++)
   {
     if (eptr> = md-> end_subject ||
         IS_NEWLINE (eptr))
       break;
     eptr ++;
     while (eptr <md-> end_subject &&
            (* eptr & 0xc0) == 0x80)
       eptr ++;
     }
   }
 } 


Here I feel that here: “It’s not for nothing.”



Well, one more example and finish, perhaps. The analyzer found a suspicious loop: "V534 - it is likely that the variable is being compared inside the 'for' operator. Consider reviewing 'i'. ”Source code:

  // Get the length of the translated array of bytes from text.
 int Text2BinTranslator :: iLengthOfTransToBin (
   char * src, int srclen)
 {
   ...
     for (k = i; i <srclen; k ++)
     {
       if (src [k] == '>>)
         break;
     }
   ...
 } 


This code is predisposed to Access Violation. The loop must continue until a '>' character is found or a string of 'srclen' characters in length ends. That's just by chance for comparison used the variable 'i', and not 'k'. If the '>' character is not found, then everything is likely to end sadly.



Conclusion



Do not forget about static analysis. He can often find interesting things even in good code. And come back to our website later to try a free general-purpose analyzer when it’s ready.

Source: https://habr.com/ru/post/107235/



All Articles