📜 ⬆️ ⬇️

Checking the source code of 7-Zip with PVS-Studio

One of the programs that allows you to solve the problem of data compression is the popular file archiver 7-Zip, and I myself often use it. Readers have long turned to us with a request to check the code of this application. Well, it's time to look at its sources and see what PVS-Studio can find interesting.




')

Introduction


A few words about the project. 7-Zip is a free file archiver with a high degree of data compression, written in C and C ++. It has a small size of 235 thousand lines of code. It supports multiple compression algorithms and a variety of data formats, including the proprietary 7z format with the highly efficient LZMA compression algorithm. The program has been developed since 1999, it is free and has open source code. 7-Zip is the winner of the 2007 SourceForge.net Community Choice Awards in the "Best Project" and "Best Technical Design" categories. Version 16.00 was chosen for verification, the source code of which was downloaded from http://www.7-zip.org/download.html

Test results


To check the 7-Zip code, the PVS-Studio v6.04 static code analyzer was used. For the article, the most interesting analyzer messages were selected and analyzed. Let's take a look at them.

Typos in conditional statements


Typos in conditional statements are often found in programs. In the case of a large number of checks, their detection can cause a lot of trouble. In such cases, a static code analyzer comes to the rescue.

I will give a few examples of this error.

V501 There are identical sub-expressions "Id == k_PPC" operator. 7zupdate.cpp 41
void SetDelta() { if (Id == k_IA64) Delta = 16; else if (Id == k_ARM || Id == k_PPC || Id == k_PPC) //<== Delta = 4; else if (Id == k_ARMT) Delta = 2; else Delta = 0; } 

The analyzer detected the same conditional expression. At best, one of the conditions Id == k_PPC is redundant and does not affect the logic of the program. To correct a typo, you just need to remove this condition, then the correct expression will look like this:
 if (Id == k_IA64) Delta = 16; else if (Id == k_ARM || Id == k_PPC) Delta = 4; 

But even more serious consequences of such a typo are possible, if instead of the k_PPC constant, in one of the recurring conditions, there must be another constant. In this case, the logic of the program may be violated.
Here is another example of a typo in a conditional statement:
V501 There are identical to the left and the right of the | || operator: offs> = nodeSize || offs> = nodeSize hfshandler.cpp 915
 HRESULT CDatabase::LoadCatalog(....) { .... UInt32 nodeSize = (1 << hr.NodeSizeLog); UInt32 offs = Get16(p + nodeOffset + nodeSize - (i + 1) * 2); UInt32 offsNext = Get16(p + nodeOffset + nodeSize - (i + 2) * 2); UInt32 recSize = offsNext - offs; if (offs >= nodeSize || offs >= nodeSize //<== || offsNext < offs || recSize < 6) return S_FALSE; .... } 

Here the problem is in the repeated condition offs> = nodeSize .

Most likely, these typos were obtained when using Copy-Paste to duplicate the code. It makes no sense to urge to abandon copying code sections This is too convenient and useful to deprive yourself of such functionality in the editor. You just need to more carefully check the result.

Identical comparisons


The analyzer detected a potential error in a construct consisting of conditional operators. Here is her example:

V517 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 388, 390. archivecommandline.cpp 388
 static void AddRenamePair(...., NRecursedType::EEnum type, ....) { .... if (type == NRecursedType::kRecursed) val.AddAscii("-r"); else if (type == NRecursedType::kRecursed) //<== val.AddAscii("-r0"); .... } 

In the code NRecursedType is defined as follows:
 namespace NRecursedType { enum EEnum { kRecursed, kWildcardOnlyRecursed, kNonRecursed }; } 

It turns out that the second condition will never be fulfilled. Let's try to understand this problem in more detail. Based on the description of the command line parameters, the -r parameter indicates the use of recursion for subdirectories. In the case of the -r0 parameter , recursion is used only for wildcard names. Comparing this with the definition of NRecursedType, we can conclude that in the second case the type NRecursedType :: kWildcardOnlyRecursed should be used . Then the correct code will look like this:
 static void AddRenamePair(...., NRecursedType::EEnum type, ....) { .... if (type == NRecursedType::kRecursed) val.AddAscii("-r"); else if (type == NRecursedType::kWildcardOnlyRecursed) //<== val.AddAscii("-r0"); .... } 

Conditions that are always true or false


It is necessary to closely monitor whether you work with a signed or unsigned type. Ignoring these features can lead to unpleasant consequences.

V547 Expression 'newSize <0' is always false. Unsigned type value is never <0. update.cpp 254

This is a sample code from a program in which this language feature was ignored:
 STDMETHODIMP COutMultiVolStream::SetSize(UInt64 newSize) { if (newSize < 0) //<== return E_INVALIDARG; .... } 

The problem is that newSize is unsigned and the condition will never be met. If a negative value is passed to the SetSize function, this error will be ignored and the function will start using the incorrect size. In 7-Zip, there are 2 more conditions that, due to the confusion with signed / unsigned, are always true or always false.

The same condition is checked twice.


The analyzer has detected a potential error related to the fact that the same condition is checked twice.

V571 Recurring check. The 'if (Result! = ((HRESULT) 0L))' condition was already verified in line 56. extractengine.cpp 58

This is how the program code fragment looks
 void Process2() { .... if (Result != S_OK) { if (Result != S_OK) //<== ErrorMessage = kCantOpenArchive; return; } .... } 

Most likely, in this situation, the second check is simply redundant, but a situation is also possible in which the programmer did not change the second condition after copying, and it turned out to be erroneous.

Similar places in 7-Zip code:

Suspicious pointer handling


It occurs in the code 7-Zip and an error, when the pointer is dereferenced at the beginning, and only then is checked for equality to zero.

V595 The 'outStreamSpec' pointer was used before it was verified against nullptr. Check lines: 753, 755. lzmaalone.cpp 753.

This is a very common mistake in all programs. It usually occurs because of inattention in the process of refactoring code. A null pointer reference will result in undefined program behavior. Consider an application code fragment containing an error of this type:
 static int main2(int numArgs, const char *args[]) { .... if (!stdOutMode) Print_Size("Output size: ", outStreamSpec->ProcessedSize); //<== if (outStreamSpec) //<== { if (outStreamSpec->Close() != S_OK) throw "File closing error"; } .... } 

The outStreamSpec pointer is dereferenceed in the outStreamSpec-> ProcessedSize expression. Then it is checked for equality to zero. One must either check the pointer even higher or the check that occurs below is meaningless. Here is a list of similar potential problem areas in the program code:

Exception inside destructor


If an exception occurs in the program, the stack collapses, during which objects are destroyed by calling destructors. If the destructor of the object being destroyed when the stack collapses throws another exception and this exception leaves the destructor, the C ++ library immediately terminates the program by calling the terminate () function. It follows from this that destructors should never propagate exceptions. The exception thrown inside the destructor must be processed inside the same destructor.

The analyzer issued the following message:

V509 The 'throw' operator inside the destructor should be placed within the try..catch block. Raising exception inside the destructor is illegal. consoleclose.cpp 62

And here's what the destructor for generating an exception looks like:
 CCtrlHandlerSetter::~CCtrlHandlerSetter() { #if !defined(UNDER_CE) && defined(_WIN32) if (!SetConsoleCtrlHandler(HandlerRoutine, FALSE)) throw "SetConsoleCtrlHandler fails"; //<== #endif } 

The V509 message warns that if the CCtrlHandlerSetter object is destroyed during exception handling, then a new exception will immediately crash the program. This code should be rewritten in such a way as to report an error that occurred in the destructor, without using the exception mechanism. If the error is not critical, then it can be ignored.
 CCtrlHandlerSetter::~CCtrlHandlerSetter() { #if !defined(UNDER_CE) && defined(_WIN32) try { if (!SetConsoleCtrlHandler(HandlerRoutine, FALSE)) throw "SetConsoleCtrlHandler fails"; //<== } catch(...) { assert(false); } #endif } 

Bool variable increment


Historically, for variables of type bool an increment operation is allowed, it sets the variable to true . This feature is related to the fact that earlier integer values ​​were used to represent Boolean variables. Subsequently, this feature remained to support backward compatibility programs. Starting with C ++ 98 standard, it is marked as deprecated and is not recommended for use. In the upcoming standard C ++ 17, the ability to use an increment for a boolean variable is marked for deletion.

In the 7-Zip code, a couple of places were found where this outdated feature is used.
 STDMETHODIMP CHandler::GetArchiveProperty(....) { .... bool numMethods = 0; for (unsigned i = 0; i < ARRAY_SIZE(k_Methods); i++) { if (methodMask & ((UInt32)1 << i)) { res.Add_Space_if_NotEmpty(); res += k_Methods[i]; numMethods++; //<== } } if (methodUnknown != 0) { char temp[32]; ConvertUInt32ToString(methodUnknown, temp); res.Add_Space_if_NotEmpty(); res += temp; numMethods++; //<== } if (numMethods == 1 && chunkSizeBits != 0) { .... } .... } 

In this situation, there are two options. Or the variable numMethods is a flag and in this case it is better to use initialization with the Boolean value numMethods = true . Or, judging by the name of the variable, it is a counter that must be an integer.

Check for failed memory allocation


The analyzer detected a situation when the value of the pointer returned by the new operator is compared with zero. As a rule, this means that the program, if it is impossible to allocate memory, will behave differently than the programmer expects.

This is a plug- in. The exception will be generated in the case of memory allocation error. far.cpp 399

Here is how it looks in the program code:
 static HANDLE MyOpenFilePluginW(const wchar_t *name) { .... CPlugin *plugin = new CPlugin( fullName, // defaultName, agent, (const wchar_t *)archiveType ); if (!plugin) return INVALID_HANDLE_VALUE; .... } 

If the new operator could not allocate memory, then, according to the C ++ standard of the language, an exception is generated std :: bad_alloc () . Thus, it does not make sense to check for equality to zero. The plugin pointer will never be zero. The function will never return the constant value INVALID_HANDLE_VALUE . If it is impossible to allocate memory, an exception occurs that is better handled at a higher level, and the test for equality to zero can simply be removed. Well, or if exceptions in the application are undesirable, then you can use the new operator that does not generate exceptions; in this case, you can check the return value to zero. In the application code, there are 3 more such checks:

Constructions requiring optimization


Now a little about the places that can potentially be optimized. An object is passed to the function. This object is passed by value, but it is not modified, since there is a keyword const . Perhaps it would be rational to pass it using a constant reference in C ++ or using a pointer in C.

Here is an example for a vector:

V801 Decreased performance. It is better to redefine it. Consider replacing 'const ... pathParts' with 'const ... & pathParts'. wildcard.cpp 487
 static unsigned GetNumPrefixParts(const UStringVector pathParts) { .... } 

Calling this function will call the copy constructor for the UStringVector class. If such copying of objects occurs frequently, then this can significantly reduce application performance. This code can be easily optimized by adding a link:
 static unsigned GetNumPrefixParts(const UStringVector& pathParts) { .... } 

Here are some more similar places:

Conclusion


7-Zip is a small project that has been developing for quite some time and, of course, failed to find a large number of serious errors. But there are still places in the code that need attention and the PVS-Studio static code analyzer will greatly facilitate this work. If you are developing a project in C, C ++ or C #, I suggest you to download PVS-Studio without delay and check your project: http://www.viva64.com/ru/pvs-studio-download/


If you want to share this article with an English-speaking audience, then please use the link to the translation: Kirill Yudintsev. Checking 7-Zip with PVS-Studio analyzer .

Read the article and have a question?
Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/302708/


All Articles