
I studied a lot of errors resulting from copying code. And I assert that most often errors are made in the last fragment of the same type of code. I have never met in the books describing this phenomenon, so I decided to write about it. I called it the “last line effect”.
Introduction
My name is Andrey Karpov. I am engaged in an unusual occupation. I examine the application code with a static analyzer and describe the errors and shortcomings found. I do this because of pragmatic and selfish motives. So our company advertises the tool PVS-Studio. Found errors. Described in the article. Attracted attention. Profit. But today the article is not about the analyzer.
In the process of analyzing projects, I save the noticed errors and the corresponding code fragments in a special database. By the way, anyone can get acquainted with the contents of this database. We turn it into a set of html-pages and post them on the website in the section "
Identified errors ".
The base is unique! Now it contains about 1500 code fragments with errors. She is waiting for people who will be able to identify patterns in these errors. This can be the basis of many studies and materials for books and articles.
')
Specifically, I did not do any research accumulated material. Nevertheless, one pattern manifests itself so clearly that I decided to study it in more detail. In articles I often have to write "pay attention to the last line." I decided that this is not casual.
Last line effect
In the source code of a program, it is often necessary to write several similar constructions in succession. Typing the same type code several times is boring and inefficient. Therefore, the Copy-Paste method is used. The code fragment is copied several times, after which the edits are made. Everyone knows that this method is bad. It is easy to forget to change something and as a result, the code will contain an error. Unfortunately, there is often no good alternative.
Now about the patterns. I found out that most often the error is allowed in the most recent copied block of code.
A simple short example:
inline Vector3int32& operator+=(const Vector3int32& other) { x += other.x; y += other.y; z += other.y; return *this; }
Pay attention to the line "z + = other.y;". They forgot to change 'y' to 'z'.
The example seems artificial. But this code is taken from a real application. Further I will convincingly show that this is a very common and common situation. This is exactly what the “last line effect” looks like. A person most often makes a mistake at the very end of making similar corrections.
I heard somewhere that climbers often fall in the last ten meters of ascent. Not because they are tired. They just rejoice that there is quite a bit left. They anticipate the sweet taste of victory over the summit. As a result, they weaken attention and make a fatal mistake. Apparently, something like this happens with programmers.
Now a few numbers.
After examining the error base, I identified 84 code fragments, which, it seems to me, were written using Copy-Paste. Of these, 41 fragments contain an error somewhere in the middle of the copied blocks. Here is an example:
strncmp(argv[argidx], "CAT=", 4) && strncmp(argv[argidx], "DECOY=", 6) && strncmp(argv[argidx], "THREADS=", 6) && strncmp(argv[argidx], "MINPROB=", 8)) {
The length of the string "THREADS =" is not 6, but 8 characters.
In the remaining 43 cases, the error was found in the most recent copied block.
At first glance, the number 43 is quite a bit more than 41. But keep in mind that there are quite a few blocks of the same type. And the error can be in the first, second, fifth or even in the tenth block. We obtain a relatively even distribution of errors in the blocks and a sharp jump at the end.
On average, I took that the number of blocks of the same type is 5.
It turns out that the first 4 blocks account for 41 errors. Or about 10 errors per block.
The last fifth block has 43 errors!
For clarity, you can build such an approximate graph:

Figure 1. A rough graph of the number of errors in five blocks of similar code.
It turns out:
The probability of making a mistake in the last copied block is 4 times greater than in any other.I do not draw any grand conclusions from this. Just an interesting observation. From a practical point of view, it is useful to know about it. Then you can force yourself not to relax at the very end.
Examples
It remains to convince readers that this is not my fantasies, but the realities of life. For this, I will demonstrate examples that confirm my words.
I, of course, will not give all examples. I confine myself to the most simple or revealing.
Source Engine SDK
inline void Init( float ix=0, float iy=0, float iz=0, float iw = 0 ) { SetX( ix ); SetY( iy ); SetZ( iz ); SetZ( iw ); }
At the end, it was necessary to call the SetW () function.
Chromium
if (access & FILE_WRITE_ATTRIBUTES) output.append(ASCIIToUTF16("\tFILE_WRITE_ATTRIBUTES\n")); if (access & FILE_WRITE_DATA) output.append(ASCIIToUTF16("\tFILE_WRITE_DATA\n")); if (access & FILE_WRITE_EA) output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n")); if (access & FILE_WRITE_EA) output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n")); break;
Matches the last and the penultimate block.
ReactOS
if (*ScanString == L'\"' || *ScanString == L'^' || *ScanString == L'\"')
Multi theft auto
class CWaterPolySAInterface { public: WORD m_wVertexIDs[3]; }; CWaterPoly* CWaterManagerSA::CreateQuad (....) { .... pInterface->m_wVertexIDs [ 0 ] = pV1->GetID (); pInterface->m_wVertexIDs [ 1 ] = pV2->GetID (); pInterface->m_wVertexIDs [ 2 ] = pV3->GetID (); pInterface->m_wVertexIDs [ 3 ] = pV4->GetID (); .... }
The last line is written by inertia and is redundant. There are only 3 elements in the array.
Source Engine SDK
intens.x=OrSIMD(AndSIMD(BackgroundColor.x,no_hit_mask), AndNotSIMD(no_hit_mask,intens.x)); intens.y=OrSIMD(AndSIMD(BackgroundColor.y,no_hit_mask), AndNotSIMD(no_hit_mask,intens.y)); intens.z=OrSIMD(AndSIMD(BackgroundColor.y,no_hit_mask), AndNotSIMD(no_hit_mask,intens.z));
In the last line they forgot to replace “BackgroundColor.y” by “BackgroundColor.z”.
Trans-Proteomic Pipeline
void setPepMaxProb(....) { .... double max4 = 0.0; double max5 = 0.0; double max6 = 0.0; double max7 = 0.0; .... if ( pep3 ) { ... if ( use_joint_probs && prob > max3 ) ... } .... if ( pep4 ) { ... if ( use_joint_probs && prob > max4 ) ... } .... if ( pep5 ) { ... if ( use_joint_probs && prob > max5 ) ... } .... if ( pep6 ) { ... if ( use_joint_probs && prob > max6 ) ... } .... if ( pep7 ) { ... if ( use_joint_probs && prob > max6 ) ... } .... }
In the last condition, we forgot to replace “prob> max6” with “prob> max7”.
Seqan
inline typename Value<Pipe>::Type const & operator*() { tmp.i1 = *in.in1; tmp.i2 = *in.in2; tmp.i3 = *in.in2; return tmp; }
Slimdx
for( int i = 0; i < 2; i++ ) { sliders[i] = joystate.rglSlider[i]; asliders[i] = joystate.rglASlider[i]; vsliders[i] = joystate.rglVSlider[i]; fsliders[i] = joystate.rglVSlider[i]; }
The last line should have used the rglFSlider array.
Qt
if (repetition == QStringLiteral("repeat") || repetition.isEmpty()) { pattern->patternRepeatX = true; pattern->patternRepeatY = true; } else if (repetition == QStringLiteral("repeat-x")) { pattern->patternRepeatX = true; } else if (repetition == QStringLiteral("repeat-y")) { pattern->patternRepeatY = true; } else if (repetition == QStringLiteral("no-repeat")) { pattern->patternRepeatY = false; pattern->patternRepeatY = false; } else {
In the very last block they forgot about 'patternRepeatX'. It should be:
pattern->patternRepeatX = false; pattern->patternRepeatY = false;
ReactOS
const int istride = sizeof(tmp[0]) / sizeof(tmp[0][0][0]); const int jstride = sizeof(tmp[0][0]) / sizeof(tmp[0][0][0]); const int mistride = sizeof(mag[0]) / sizeof(mag[0][0]); const int mjstride = sizeof(mag[0][0]) / sizeof(mag[0][0]);
The variable 'mjstride' will always be equal to one. The last line should be:
const int mjstride = sizeof(mag[0][0]) / sizeof(mag[0][0][0]);
Mozilla firefox
if (protocol.EqualsIgnoreCase("http") || protocol.EqualsIgnoreCase("https") || protocol.EqualsIgnoreCase("news") || protocol.EqualsIgnoreCase("ftp") || <<<--- protocol.EqualsIgnoreCase("file") || protocol.EqualsIgnoreCase("javascript") || protocol.EqualsIgnoreCase("ftp")) { <<<---
Suspicious string "ftp" at the end. This line has already been compared.
Quake-III-Arena
if (fabs(dir[0]) > test->radius || fabs(dir[1]) > test->radius || fabs(dir[1]) > test->radius)
Did not check value from dir cell [2].
Clang
return (ContainerBegLine <= ContaineeBegLine && ContainerEndLine >= ContaineeEndLine && (ContainerBegLine != ContaineeBegLine || SM.getExpansionColumnNumber(ContainerRBeg) <= SM.getExpansionColumnNumber(ContaineeRBeg)) && (ContainerEndLine != ContaineeEndLine || SM.getExpansionColumnNumber(ContainerREnd) >= SM.getExpansionColumnNumber(ContainerREnd)));
At the very end, the expression "SM.getExpansionColumnNumber (ContainerREnd)" is compared to itself.
MongoDB
bool operator==(const MemberCfg& r) const { .... return _id==r._id && votes == r.votes && h == rh && priority == r.priority && arbiterOnly == r.arbiterOnly && slaveDelay == r.slaveDelay && hidden == r.hidden && buildIndexes == buildIndexes; }
Lost at the very end about the "r.".
Unreal Engine 4
static bool PositionIsInside(....) { return Position.X >= Control.Center.X - BoxSize.X * 0.5f && Position.X <= Control.Center.X + BoxSize.X * 0.5f && Position.Y >= Control.Center.Y - BoxSize.Y * 0.5f && Position.Y >= Control.Center.Y - BoxSize.Y * 0.5f; }
The last line forgot to make 2 edits. First, you need to replace "> =" with "<=. Secondly, replace the minus with a plus.
Qt
qreal x = ctx->callData->args[0].toNumber(); qreal y = ctx->callData->args[1].toNumber(); qreal w = ctx->callData->args[2].toNumber(); qreal h = ctx->callData->args[3].toNumber(); if (!qIsFinite(x) || !qIsFinite(y) || !qIsFinite(w) || !qIsFinite(w))
In the most recent call to the qIsFinite function, the 'h' variable must be used as an argument.
Openssl
if (!strncmp(vstart, "ASCII", 5)) arg->format = ASN1_GEN_FORMAT_ASCII; else if (!strncmp(vstart, "UTF8", 4)) arg->format = ASN1_GEN_FORMAT_UTF8; else if (!strncmp(vstart, "HEX", 3)) arg->format = ASN1_GEN_FORMAT_HEX; else if (!strncmp(vstart, "BITLIST", 3)) arg->format = ASN1_GEN_FORMAT_BITLIST;
The length of the string "BITLIST" is not 3, but 7 characters.
On this stop. I think these examples are more than enough.
Conclusion
In this article, you learned that when using Copy-Paste, the probability of making a mistake in the last copied block is 4 times higher than in any other block.
This is a feature of human psychology, not his professional skills. In the article, we saw that even highly skilled developers of projects such as Clang or Qt are prone to mistakes in the end.
I hope my observation will be useful. And, perhaps, will push people to research the accumulated base of errors. I think it will allow you to find many interesting patterns and formulate new recommendations for programmers.
This article is in English.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov.
The Last Line Effect .