
My readers asked to compare Manticore and Sphinx projects in terms of code quality. I can do this in only one way I have mastered - check projects with the help of the PVS-Studio static analyzer and calculate the error density in the code. So, I checked the C and C ++ code in these projects and, in my opinion, the quality of the Manticore code is higher than the quality of the Sphinx code. Naturally, this is a very narrow view, and I do not pretend to the authenticity of my research. However, I was asked, and I made the comparison as I can.
Sphinx and Manticore
First, let's take a look at the Sphinx and Manticore projects.
Sphinx is a full-text search system developed by Andrei Aksenov and distributed under the GNU GPL license. A distinctive feature is the high speed of indexing and searching, as well as integration with existing DBMS and API for common web programming languages.
I took the source code
from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 156 KLOC. Comments make up 10.2%. This means that the “clean code” is 144 KLOC.
')
Manticore is Sphinx fork. Starting as key members of the original Sphinx team, the Manticore team has set itself the following goal - to deliver fast, stable and powerful free software for full-text search.
I took the source code
from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 170 KLOC. Comments make up 10.1%. This means that the “clean code” is 152 KLOC.
The number of lines of code in the Manticore project is slightly larger, and I will take this into account when assessing the density of errors found.
Comparative analysis
The code of these projects is very similar, and very often the same error is present in one or another project. I’ll say right away that this time I carried out the analysis superficially and studied only general warnings of the High level, issued by the PVS-Studio analyzer.
Why am I too lazy to compare projects more carefully? As I have already said, the projects are very similar, and already when I viewed High-level warnings, I became bored. In general, the picture is clear. The analyzer issued very similar lists of warnings, but only slightly more in the Sphinx project. I think the situation with the warnings of other levels will be exactly the same.
In the article I will consider only some fragments of code with errors that for some reason seemed interesting to me. More detailed analysis of projects can be performed by their developers. I am ready to provide them with temporary license keys.
I suggest that readers also download the demo version of
PVS-Studio and check the code of their projects. I am sure you will find a lot of interesting things in them.
Common mistakes
I will start with the errors that were found in both the Sphinx project and Manticore.
CWE-476: NULL Pointer Dereference
Expr_StrIn_c ( const CSphAttrLocator & tLoc, int iLocator, ConstList_c * pConsts, UservarIntSet_c * pUservar, ESphCollation eCollation ) : Expr_ArgVsConstSet_c<int64_t> ( NULL, pConsts ) , ExprLocatorTraits_t ( tLoc, iLocator ) , m_pStrings ( NULL ) , m_pUservar ( pUservar ) { assert ( tLoc.m_iBitOffset>=0 && tLoc.m_iBitCount>0 ); assert ( !pConsts || !pUservar ); m_fnStrCmp = GetCollationFn ( eCollation ); const char * sExpr = pConsts->m_sExpr.cstr();
I brought a fairly large code snippet, but do not worry, everything is simple. Note the formal argument of
pConsts . This pointer is used in the constructor to initialize the
sExpr variable. At the same time, there is nowhere in the constructor a check in the event that
NULL is passed as an argument, i.e. There is no protection against a null pointer. The variable
pConsts is just bravely
derelated .
Note. There is a check in the form of
assert , but it does not help in the release-version, so such a check can not be considered sufficient.
Now let's take a look at the
CreateInNode function code, where an instance of the
Expr_StrIn_c class is
created :
ISphExpr * ExprParser_t::CreateInNode ( int iNode ) { .... case TOK_ATTR_STRING: return new Expr_StrIn_c ( tLeft.m_tLocator, tLeft.m_iLocator, NULL,
The third actual argument is
NULL . Accordingly, if this code fragment is executed, then the null pointer dereference will occur.
The analyzer signals this error by issuing a warning: V522 Dereferencing of the null pointer 'pConsts' might take place. The null pointer is passed into the Expr_StrIn_c function. Inspect the third argument. Check lines: 5407, 5946. sphinxexpr.cpp 5407
This error is interesting because PVS-Studio performs data-flow analysis, examining the bodies of two different functions. However, it can also perform a much more complicated embedded analysis. Consider this case.
First, we consider the
SendBytes function, in which, in fact, the null pointer will be dereferenced.
void ISphOutputBuffer::SendBytes ( const void * pBuf, int iLen ) { int iOff = m_dBuf.GetLength(); m_dBuf.Resize ( iOff + iLen ); memcpy ( m_dBuf.Begin() + iOff, pBuf, iLen ); }
Notice the
pBuf pointer. It is not checked anywhere and is immediately passed as an actual argument to the
memcpy function. Accordingly, if the
pBuf pointer is zero, then when the
memcpy function is called, it will read at the null pointer.
Why does PVS-Studio decide that there will be an error here? To answer this question, let's go up the call chain above and look at the
SendMysqlOkPacket function.
void SendMysqlOkPacket ( ISphOutputBuffer & tOut, BYTE uPacketID, int iAffectedRows=0, int iWarns=0, const char * sMessage=NULL, bool bMoreResults=false ) { DWORD iInsert_id = 0; char sVarLen[20] = {0}; void * pBuf = sVarLen; pBuf = MysqlPack ( pBuf, iAffectedRows ); pBuf = MysqlPack ( pBuf, iInsert_id ); int iLen = (char *) pBuf - sVarLen; int iMsgLen = 0; if ( sMessage ) iMsgLen = strlen(sMessage) + 1; tOut.SendLSBDword ( (uPacketID<<24) + iLen + iMsgLen + 5); tOut.SendByte ( 0 ); tOut.SendBytes ( sVarLen, iLen ); if ( iWarns<0 ) iWarns = 0; if ( iWarns>65535 ) iWarns = 65535; DWORD uWarnStatus = iWarns<<16; if ( bMoreResults ) uWarnStatus |= ( SPH_MYSQL_FLAG_MORE_RESULTS ); tOut.SendLSBDword ( uWarnStatus ); tOut.SendBytes ( sMessage, iMsgLen ); }
I apologize that I had to bring the whole body of the function. I wanted to show that there is no protection in the function from the fact that the argument
sMessage will be equal to
NULL . The
sMessage pointer
is simply passed to the
SendBytes function.
I also want to note that the default value of the formal argument
sMessage is
NULL :
const char * sMessage=NULL,
This is dangerous in itself. However, the fact that the default argument is
NULL does not mean anything. Perhaps the correct arguments are always passed to the function. Therefore, let's go further:
inline void Ok( int iAffectedRows=0, int iWarns=0, const char * sMessage=NULL, bool bMoreResults=false ) { SendMysqlOkPacket ( m_tOut, m_uPacketID, iAffectedRows, iWarns, sMessage, bMoreResults ); if ( bMoreResults ) m_uPacketID++; }
In the
Ok function
, the
sMessage argument
is simply passed to the
SendMysqlOkPacket function. We continue.
void HandleMysqlMultiStmt (....) { .... dRows.Ok ( 0, 0, NULL, bMoreResultsFollow ); .... }
Here we finish our journey. Only 4 actual arguments are passed to the function. The remaining arguments are defaulted. This means that the fifth parameter,
sMessage , will be
NULL , and the null pointer dereference will occur.
A warning from the PVS-Studio analyzer that indicates this error: V522 Dereferencing of the null pointer 'pBuf' might take place. The null pointer is passed into the 'Ok' function. Inspect the third argument. Check lines: 2567, 12267, 12424, 14979. searchd.cpp 2567
CWE-570: Expression is Always False
To begin, consider the
ESphBinRead enumeration.
enum ESphBinRead { BIN_READ_OK,
As you can see, there are no named constants with negative values.
Just in case, take a look at the
ReadBytes function and make sure that it honestly returns values ​​without any tricks.
ESphBinRead CSphBin::ReadBytes ( void * pDest, int iBytes ) { .... return BIN_READ_EOF; .... return BIN_READ_ERROR; .... return BIN_READ_OK; }
As you can see, all the values ​​returned by the function are greater than or equal to 0. Now it’s the turn of the code with the error:
static void DictReadEntry (....) { .... if ( pBin->ReadBytes ( pKeyword, iKeywordLen )<0 ) { assert ( pBin->IsError() ); return; } .... }
PVS-Studio warning: V547 Expression is always false. sphinx.cpp 22416
Such a check does not make sense. The condition is always false, and as a result, incorrect situations when reading data are not processed. Most likely, it should have been written here:
if ( pBin->ReadBytes ( pKeyword, iKeywordLen ) != BIN_READ_OK)
This code demonstrates that the author of the code only thinks that the program will handle incorrect situations. In fact, I often encounter defects in code that is responsible for handling incorrect / non-standard situations. Therefore, programs often fall when something goes wrong. Error handlers are simply incorrectly written in them.
No mystery why this happens, of course not. Testing such parts of the program is difficult and uninteresting. This is one of those cases where a static analyzer can be a good helper, since it checks the code no matter how often it gets control.
CWE-14: Compiler Clear Buffers Code
static bool GetFileStats (....) { .... struct_stat tStat; memset ( &tStat, 0, sizeof ( tStat ) ); if ( stat ( szFilename, &tStat ) < 0 ) { if ( pError ) *pError = strerror ( errno ); memset ( &tStat, 0, sizeof ( tStat ) );
PVS-Studio warning: V597 The compiler could delete the memset function call, which is used to flush the tStat object. The memset_s () function should be used to erase the private data. sphinx.cpp 19987
The compiler has the right to remove a call to the
memset function, which, in case of an error in the program, should clear the private data in
tStat .
Why does the compiler do this, I have written many times, so I will not repeat. For those who have not yet encountered such situations, I suggest reading the description of the
V597 diagnostic or see the description of the
CWE-14 .
CWE-762: Mismatched Memory Management Routines
First we need to look at the implementation of two macros:
#define SafeDelete(_x) \ { if (_x) { delete (_x); (_x) = nullptr; } } #define SafeDeleteArray(_x) \ { if (_x) { delete [] (_x); (_x) = nullptr; } }
Now, I think it’s easy for you to find the error in this code:
int CSphIndex_VLN::DebugCheck ( FILE * fp ) { .... CSphRowitem * pInlineStorage = NULL; if ( pQword->m_iInlineAttrs ) pInlineStorage = new CSphRowitem [ pQword->m_iInlineAttrs ]; ....
PVS-Studio warning: V611, it was allocated using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] pInlineStorage;'. sphinx.cpp 19178
As you can see, the memory is allocated for the array, and is released, as if only one element was created. Instead of the
SafeDelete macro, the
SafeDeleteArray macro should be used
here .
Unique bugs
Above, I looked at a few errors that reveal themselves in both Sphinx and Manticore code. In this case, of course, there are errors inherent in only one project. Consider for example one such case.
In both projects there is a function
RotateIndexMT . But it is implemented in different ways. In the implementation of the Sphinx project, this function contains a defect CWE-690 (Unchecked Return Value to NULL Pointer Dereference).
First, let's look at the declaration of the
CheckServedEntry function:
static bool CheckServedEntry(const ServedIndex_c * pEntry,
The first argument is a pointer to a constant object. Therefore, the function cannot change this object and the pointer itself.
Now the function containing the error:
static bool RotateIndexMT ( .... ) { .... ServedIndex_c * pServed = g_pLocalIndexes->GetWlockedEntry ( sIndex ); pServed->m_sNewPath = "";
PVS-Studio warning: V595 The 'pServed' pointer was used before it was verified against nullptr. Check lines: 17334, 17337. searchd.cpp 17334
First, the
pServed pointer
is dereferenced. Next, the
CheckServedEntry function is
called , which, as we have already learned, cannot change the
pServed pointer passed as the first actual argument.
Next, the
pServed pointer
is checked for
NULL equality. Aha So the pointer could be null. Therefore, above, before the first pointer dereferencing, you need to add a check.
Another option:
if (pServed) is an extra check
if the pointer is never
NULL . In any case, the code must be corrected.
Let's sum up
The Sphinx project is smaller in size to the Manticore project. At the same time, in the Sphinx project, I noticed more errors and a “wrap code” than in the Manticore project.
Given the size of the projects and the number of defects seen, I got the following result. Take the error density in Manticore for 1. Then, in my Sphinx project, the error density in my estimation is 1.5.
My findings. The density of errors in the Sphinx project is one and a half times higher than in the Manticore project. Consequently, the quality of the Manticore code is better than that of the Sphinx project. Fork turned out better than the original.
I repeat once again that this is my subjective opinion, based on a very small amount of information. The density of errors in the code of some components does not yet speak about the quality and reliability of the project as a whole.
Download and try the PVS-Studio analyzer. It's simple. In the end, even if you write the perfect code, you can always
look for errors in the code of your colleagues :).
Thanks for attention. Subscribe to
Twitter or
RSS to keep abreast of our new publications.
If you want to share this article with an English-speaking audience, then please use the link to the translation
of the Sphinx project