📜 ⬆️ ⬇️

The consequences of using Copy-Paste technology when programming in C ++ and how to deal with it

Copy-Paste, Ctrl-C, Ctrl-V
I am creating a PVS-Studio analyzer that detects errors in the source code of C / C ++ / C ++ 0x applications. In this regard, I have to look through a large amount of source code of various applications, where with the help of PVS-Studio, suspicious areas of code were detected. I have accumulated enough examples in which it can be clearly seen when the error came into being due to copying a section of code and its modification. Of course, this is not a new idea that using Copy-Paste when programming is bad. However, try not to get off with the recommendation “do not copy the code” and approach this topic more closely.


Usually when people talk about Copy-Paste in programming, they mean the following situation. A function or a large code fragment is copied entirely, and then the copied code is modified. Such actions create a large number of similar code in the program, which makes it difficult to maintain it. You have to change the same fragments of the algorithm in different functions and it is very easy to forget to fix something.

In this case, it is really appropriate to say that it is better not to copy the code. If you want to create a function with similar behavior, it is useful to perform refactoring and isolate the common code into separate methods / classes [1]. Or use templates and lambda functions. We will not consider in more detail how to avoid duplication of the code, since this does not apply to the main issue. The main thing is to avoid duplication of code in various functions. A lot has been written about this and most programmers are familiar with useful recommendations.
')
Let's focus now on the moment that is usually silent in the books and articles on writing high-quality code. In fact, it is impossible to program without Copy-Paste. It's like sex in the Soviet Union. He is not there, but everyone does it.

We all copy small pieces of code when we need to write something like this:

  GetMenu () -> CheckMenuItem (IDC_ LINES_X, MF_BYCOMMAND | nState);
 GetMenu () -> CheckMenuItem (IDC_ LINES_Y, MF_BYCOMMAND | nState); 

Admit yourself honestly that we are too lazy to type a line that differs only in the fact that instead of the character 'X' we will have to write the character 'Y'. And this is correct and logical. Copying and editing will be faster than typing the second one again, even with the use of special tools such as Visual Assist and IntelliSence.

At the same time, there is no point in talking about code duplication. How not to think, it is easier to write here. Similar examples can be given a huge amount, taking any program. I do not like the fact that here the example concerns the GUI, and in other tasks we will encounter a similar one:

  int texlump1 = Wads.CheckNumForName ("TEXTURE1", ns_global, wadnum);
 int texlump2 = Wads.CheckNumForName ("TEXTURE2", ns_global, wadnum); 

The trouble is that with this “microcopying” the probability of an error is also quite high. And since such small copies of the code are much more than copies of large blocks, this is a really important problem. How to deal with this problem is not clear, so they try to keep silent about it. You can not prevent programmers from copying code. This is insanity.

Many of these errors are detected at the first launch of the program and are quickly and painlessly corrected. But many remain and live in the code for years, waiting in the wings. It is difficult to detect such errors in the code, since it is difficult to look at similar lines of code and a person’s attention is quickly dulled. At the same time, the presence of errors caused by Copy-Paste is practically independent of the professionalism of the programmer. Anyone can be sealed and view something. Defects of this kind come across even in very well-known and high-quality software products.

To better clarify what kind of errors we are talking about, let's look at a few code examples taken from open-source projects. As an advertisement: the errors listed here were detected by me using a general-purpose analyzer included in PVS-Studio [ 2 ].

The code is taken from a sound recording and editing program - Audacity .

  sampleCount VoiceKey :: OnBackward (...) {
   ...
   int atrend = sgn (
     buffer [samplesleft - 2] -buffer [samplesleft - 1]);                          
   int ztrend = sgn (
     buffer [samplesleft - WindowSizeInt-2] -
       buffer [samplesleft - WindowSizeInt-2]);
   ...
 } 

The programmer courageously and correctly wrote the initialization of the variable 'atrend'. Began to write the initialization of the variable 'ztrend'. Wrote "sgn (buffer [samplesleft - WindowSizeInt-2]". Then he sighed and copied a piece of string. He forgot to edit. As a result, the 'sgn' function will take the value 0 as an argument.

The next scenario will be similar. A programmer writes a long condition in Crystal Space 3D SDK:

  inline_ bool Contains (const LSS & lss)
 {
   // We check the LSS contains the two 
   // spheres at the start and end of the sweep
   return
     Contains (Sphere (lss.mP0, lss.mRadius)) && 
     Contains (Sphere (lss.mP0, lss.mRadius));
 } 

Here I just want to copy “Contains (Sphere (lss.mP0, lss.mRadius))” and replace the name 'mP0' with 'mP1'. But it is so easy to accidentally forget to do.

Probably, you sometimes noticed that the program windows suddenly suddenly start behaving in a strange way. For example, many programmers will remember the search box in the first edition of Visual Studio 2010. I think such oddities happen because of a lucky combination of circumstances and code like this:

  void COX3DTabViewContainer :: OnNcPaint () 
 {
   ...
   if (rectClient.top <rectClient.bottom &&
      rectClient.top <rectClient.bottom)
   {
     dc.ExcludeClipRect (rectClient);
   }
   ...
 } 

This code is taken from the well-known Ultimate ToolBox class set. Normally it will be drawn control or not, will depend on its location.

And in the eLynx Image Processing SDK copied a whole line, and this replicated a typo.

  void uteTestRunner :: StressBayer (uint32 iFlags)
 {
   ...
   static EPixelFormat ms_pfList [] = 
     {PF_Lub, PF_Lus, PF_Li, PF_Lf, PF_Ld};
   const int fsize = sizeof (ms_pfList) / sizeof (ms_pfList);

   static EBayerMatrix ms_bmList [] = 
     {BM_GRBG, BM_GBRG, BM_RGGB, BM_BGGR, BM_None};
   const int bsize = sizeof (ms_bmList) / sizeof (ms_bmList);
   ...
 } 

Because of the forgotten pointer dereference, the variable 'fsize' is equal to 1. And then this code was adapted to initialize 'bsize'. I do not believe that it is possible to make a mistake two times in a row, if you do not copy the code.

In the EIB Suite project, the line “if (_relativeTime <= 143)” was copied and edited. But in the latter, the condition for changing it was forgotten:

  string TimePeriod :: toString () const
 {
   ...
   if (_relativeTime <= 143)
     os << ((int) _relativeTime + 1) * 5 << _ ("minutes");
   else if (_relativeTime <= 167)
     os << 12 * 60 + ((int) _relativeTime - 143) * 30 << _ ("minutes");
   else if (_relativeTime <= 196)
     os << (int) _relativeTime - 166 << _ ("days");
   else if (_relativeTime <= 143)
     os << (int) _relativeTime - 192 << _ ("weeks");
   ...
 } 

So the code "os << (int) _relativeTime - 192 << _ (" weeks ");" never get control.

Even programmers at Intel are also programmers, not demigods. Failed to copy in TickerTape project:

  void DXUTUpdateD3D10DeviceStats (...)
 {
   ...
   else if (DeviceType == D3D10_DRIVER_TYPE_SOFTWARE)
     wcscpy_s (pstrDeviceStats, 256, L "WARP");
   else if (DeviceType == D3D10_DRIVER_TYPE_HARDWARE)
     wcscpy_s (pstrDeviceStats, 256, L "HARDWARE");
   else if (DeviceType == D3D10_DRIVER_TYPE_SOFTWARE)
     wcscpy_s (pstrDeviceStats, 256, L "SOFTWARE");
   ...
 } 

The “DeviceType == D3D10_DRIVER_TYPE_SOFTWARE” condition is repeated twice.

In general, in the thickets of conditional statements is very easy to see the error. In the implementation of Multi-threaded Dynamic Queue , regardless of what the IsFixed () function returns, we will do the same thing:

  BOOL CGridCellBase :: PrintCell (...)
 {
   ...
   if (IsFixed ())
     crFG = (GetBackClr ()! = CLR_DEFAULT)?
       GetTextClr (): pDefaultCell-> GetTextClr ();
   else
     crFG = (GetBackClr ()! = CLR_DEFAULT)?
       GetTextClr (): pDefaultCell-> GetTextClr ();
   ...
 } 

By the way, copying the code is easy and pleasant! It is not a pity to write an extra point. :)

  void RB_CalcColorFromOneMinusEntity (unsigned char * dstColors) {
   ...
   unsigned char invModulate [3];
   ...
   invModulate [0] = 255 - backEnd.currentEntity-> e.shaderRGBA [0];
   invModulate [1] = 255 - backEnd.currentEntity-> e.shaderRGBA [1];
   invModulate [2] = 255 - backEnd.currentEntity-> e.shaderRGBA [2];
   invModulate [3] = 255 - backEnd.currentEntity-> e.shaderRGBA [3];
   ...
 } 

And it does not matter that the 'invModulate' array consists of only three elements. The code is taken from the project of the legendary game Wolfenstein 3D .

And finally, the example is more complicated. The code is taken from the very useful tool Notepad ++ .

  void KeyWordsStyleDialog :: updateDlg () 
 {
   ...
   Style & w1Style =
     _pUserLang -> _ styleArray.getStyler (STYLE_WORD1_INDEX);
   styleUpdate (w1Style, _pFgColour [0], _pBgColour [0],
     IDC_KEYWORD1_FONT_COMBO, IDC_KEYWORD1_FONTSIZE_COMBO,
     IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK,
     IDC_KEYWORD1_UNDERLINE_CHECK);

   Style & w2Style =
     _pUserLang -> _ styleArray.getStyler (STYLE_WORD2_INDEX);
   styleUpdate (w2Style, _pFgColour [1], _pBgColour [1],
     IDC_KEYWORD2_FONT_COMBO, IDC_KEYWORD2_FONTSIZE_COMBO,
     IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK,
     IDC_KEYWORD2_UNDERLINE_CHECK);

   Style & w3Style =
     _pUserLang -> _ styleArray.getStyler (STYLE_WORD3_INDEX);
   styleUpdate (w3Style, _pFgColour [2], _pBgColour [2],
     IDC_KEYWORD3_FONT_COMBO, IDC_KEYWORD3_FONTSIZE_COMBO,
     IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_BOLD_CHECK,
     IDC_KEYWORD3_UNDERLINE_CHECK);

   Style & w4Style =
     _pUserLang -> _ styleArray.getStyler (STYLE_WORD4_INDEX);
   styleUpdate (w4Style, _pFgColour [3], _pBgColour [3],
     IDC_KEYWORD4_FONT_COMBO, IDC_KEYWORD4_FONTSIZE_COMBO,
     IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK,
     IDC_KEYWORD4_UNDERLINE_CHECK);
   ...
 } 

It is necessary to break eyes, to consider an error here. Therefore, I will shorten the code for clarity:

  styleUpdate (...
   IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK,
   ...);
 styleUpdate (...
   IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK,
   ...);
 styleUpdate (...
   IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_BOLD_CHECK,
   ...);
 styleUpdate (...
   IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK,
   ...); 

The developer’s hand trembled and he copied the wrong resource name.

I can still give the defective code in this article, but it is no longer interesting. With all these examples, I just wanted to show that such errors are present in a wide variety of projects and are made by both beginners and professionals. Let us turn, finally, to a discussion of the question of what to do with all this.

To be honest, I do not know the full answer. At least in the books about such situations, I did not read. But in practice, I often met the effects of small Copy-Paste in programs. Including in their own. Have to improvise, giving an answer to the question.

We will proceed from the following position:

Programmers copy portions of the code and will copy as it is convenient. Consequently, such errors will always occur in programs.

From this conclusion:

To prevent such errors completely impossible, but you can try to reduce the likelihood of their creation.

I see two ways how to reduce the number of errors of this kind. First, it is rational to use tools such as static code analyzers. They allow you to detect many errors of this class. And they will do it at the earliest stages. It is cheaper and easier to detect and correct the error immediately after writing the code than to work with the same error detected during testing.

The second way to help in some cases to reduce the number of errors is to discipline yourself and format the copied code in a special way. Let me explain by example.

  int ztrend = sgn (
     buffer [samplesleft - WindowSizeInt-2] -buffer [samplesleft - WindowSizeInt-2]); 

So, the error is much more difficult to notice than if the code looked like this:

  int ztrend = sgn (
     buffer [samplesleft - WindowSizeInt-2] -
     buffer [samplesleft - WindowSizeInt-2]); 

You should format the code so that the places that should be different are lined up visually in a column. So make a mistake will be much more difficult. It is clear that in many cases this does not save and I cited such examples above. However, at least something is better than nothing at all.

Unfortunately, I don’t know any other ways to reduce the number of errors associated with Copy-Paste. You can also use the tools to find duplicate and similar code, but this can be attributed to the advice of using static analyzers.

I appeal to you readers. It will be interesting to me if you share your thoughts about this and suggest other ways to avoid Copy-Paste errors. Perhaps interesting ideas will be heard and many will benefit greatly.

Bibliographic list


  1. Steve McConnell, "Code Complete, 2nd Edition" Microsoft Press, Paperback, 2nd Edition, Published June 2004, 914 pages, ISBN: 0-7356-1967-0. (Part 24.3. Reasons to Refactor)
  2. Presentation "PVS-Studio, a comprehensive solution for the development of modern resource-intensive applications." http://www.viva64.com/ru/pvs-studio-presentation/

Source: https://habr.com/ru/post/112276/


All Articles