Andrei Karpov believes that the code for the Manticore project is better than the code for the Sphinx project

My readers asked to compare Manticore and Sphinx projects in terms of code quality. I can do this in only one way I have mastered - check projects with the help of the PVS-Studio static analyzer and calculate the error density in the code. So, I checked the C and C ++ code in these projects and, in my opinion, the quality of the Manticore code is higher than the quality of the Sphinx code. Naturally, this is a very narrow view, and I do not pretend to the authenticity of my research. However, I was asked, and I made the comparison as I can.

Sphinx and Manticore

First, let's take a look at the Sphinx and Manticore projects.

Sphinx is a full-text search system developed by Andrei Aksenov and distributed under the GNU GPL license. A distinctive feature is the high speed of indexing and searching, as well as integration with existing DBMS and API for common web programming languages.

I took the source code from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 156 KLOC. Comments make up 10.2%. This means that the “clean code” is 144 KLOC.
')
Manticore is Sphinx fork. Starting as key members of the original Sphinx team, the Manticore team has set itself the following goal - to deliver fast, stable and powerful free software for full-text search.

I took the source code from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 170 KLOC. Comments make up 10.1%. This means that the “clean code” is 152 KLOC.

The number of lines of code in the Manticore project is slightly larger, and I will take this into account when assessing the density of errors found.

Comparative analysis

The code of these projects is very similar, and very often the same error is present in one or another project. I’ll say right away that this time I carried out the analysis superficially and studied only general warnings of the High level, issued by the PVS-Studio analyzer.

Why am I too lazy to compare projects more carefully? As I have already said, the projects are very similar, and already when I viewed High-level warnings, I became bored. In general, the picture is clear. The analyzer issued very similar lists of warnings, but only slightly more in the Sphinx project. I think the situation with the warnings of other levels will be exactly the same.

In the article I will consider only some fragments of code with errors that for some reason seemed interesting to me. More detailed analysis of projects can be performed by their developers. I am ready to provide them with temporary license keys.

I suggest that readers also download the demo version of PVS-Studio and check the code of their projects. I am sure you will find a lot of interesting things in them.

Common mistakes

I will start with the errors that were found in both the Sphinx project and Manticore.

CWE-476: NULL Pointer Dereference

Expr_StrIn_c ( const CSphAttrLocator & tLoc, int iLocator, ConstList_c * pConsts, UservarIntSet_c * pUservar, ESphCollation eCollation ) : Expr_ArgVsConstSet_c<int64_t> ( NULL, pConsts ) , ExprLocatorTraits_t ( tLoc, iLocator ) , m_pStrings ( NULL ) , m_pUservar ( pUservar ) { assert ( tLoc.m_iBitOffset>=0 && tLoc.m_iBitCount>0 ); assert ( !pConsts || !pUservar ); m_fnStrCmp = GetCollationFn ( eCollation ); const char * sExpr = pConsts->m_sExpr.cstr(); // <= .... }

I brought a fairly large code snippet, but do not worry, everything is simple. Note the formal argument of pConsts . This pointer is used in the constructor to initialize the sExpr variable. At the same time, there is nowhere in the constructor a check in the event that NULL is passed as an argument, i.e. There is no protection against a null pointer. The variable pConsts is just bravely derelated .

Note. There is a check in the form of assert , but it does not help in the release-version, so such a check can not be considered sufficient.

Now let's take a look at the CreateInNode function code, where an instance of the Expr_StrIn_c class is created :

 ISphExpr * ExprParser_t::CreateInNode ( int iNode ) { .... case TOK_ATTR_STRING: return new Expr_StrIn_c ( tLeft.m_tLocator, tLeft.m_iLocator, NULL, // <= pUservar, m_eCollation ); .... }

The third actual argument is NULL . Accordingly, if this code fragment is executed, then the null pointer dereference will occur.

The analyzer signals this error by issuing a warning: V522 Dereferencing of the null pointer 'pConsts' might take place. The null pointer is passed into the Expr_StrIn_c function. Inspect the third argument. Check lines: 5407, 5946. sphinxexpr.cpp 5407

This error is interesting because PVS-Studio performs data-flow analysis, examining the bodies of two different functions. However, it can also perform a much more complicated embedded analysis. Consider this case.

First, we consider the SendBytes function, in which, in fact, the null pointer will be dereferenced.

 void ISphOutputBuffer::SendBytes ( const void * pBuf, int iLen ) { int iOff = m_dBuf.GetLength(); m_dBuf.Resize ( iOff + iLen ); memcpy ( m_dBuf.Begin() + iOff, pBuf, iLen ); }

Notice the pBuf pointer. It is not checked anywhere and is immediately passed as an actual argument to the memcpy function. Accordingly, if the pBuf pointer is zero, then when the memcpy function is called, it will read at the null pointer.

Why does PVS-Studio decide that there will be an error here? To answer this question, let's go up the call chain above and look at the SendMysqlOkPacket function.

 void SendMysqlOkPacket ( ISphOutputBuffer & tOut, BYTE uPacketID, int iAffectedRows=0, int iWarns=0, const char * sMessage=NULL, bool bMoreResults=false ) { DWORD iInsert_id = 0; char sVarLen[20] = {0}; void * pBuf = sVarLen; pBuf = MysqlPack ( pBuf, iAffectedRows ); pBuf = MysqlPack ( pBuf, iInsert_id ); int iLen = (char *) pBuf - sVarLen; int iMsgLen = 0; if ( sMessage ) iMsgLen = strlen(sMessage) + 1; tOut.SendLSBDword ( (uPacketID<<24) + iLen + iMsgLen + 5); tOut.SendByte ( 0 ); tOut.SendBytes ( sVarLen, iLen ); if ( iWarns<0 ) iWarns = 0; if ( iWarns>65535 ) iWarns = 65535; DWORD uWarnStatus = iWarns<<16; if ( bMoreResults ) uWarnStatus |= ( SPH_MYSQL_FLAG_MORE_RESULTS ); tOut.SendLSBDword ( uWarnStatus ); tOut.SendBytes ( sMessage, iMsgLen ); }

I apologize that I had to bring the whole body of the function. I wanted to show that there is no protection in the function from the fact that the argument sMessage will be equal to NULL . The sMessage pointer is simply passed to the SendBytes function.

I also want to note that the default value of the formal argument sMessage is NULL :

 const char * sMessage=NULL,

This is dangerous in itself. However, the fact that the default argument is NULL does not mean anything. Perhaps the correct arguments are always passed to the function. Therefore, let's go further:

 inline void Ok( int iAffectedRows=0, int iWarns=0, const char * sMessage=NULL, bool bMoreResults=false ) { SendMysqlOkPacket ( m_tOut, m_uPacketID, iAffectedRows, iWarns, sMessage, bMoreResults ); if ( bMoreResults ) m_uPacketID++; }

In the Ok function , the sMessage argument is simply passed to the SendMysqlOkPacket function. We continue.

 void HandleMysqlMultiStmt (....) { .... dRows.Ok ( 0, 0, NULL, bMoreResultsFollow ); .... }

Here we finish our journey. Only 4 actual arguments are passed to the function. The remaining arguments are defaulted. This means that the fifth parameter, sMessage , will be NULL , and the null pointer dereference will occur.

A warning from the PVS-Studio analyzer that indicates this error: V522 Dereferencing of the null pointer 'pBuf' might take place. The null pointer is passed into the 'Ok' function. Inspect the third argument. Check lines: 2567, 12267, 12424, 14979. searchd.cpp 2567

CWE-570: Expression is Always False

To begin, consider the ESphBinRead enumeration.

 enum ESphBinRead { BIN_READ_OK, ///< bin read ok BIN_READ_EOF, ///< bin end BIN_READ_ERROR, ///< bin read error BIN_PRECACHE_OK, ///< precache ok BIN_PRECACHE_ERROR ///< precache failed };

As you can see, there are no named constants with negative values.

Just in case, take a look at the ReadBytes function and make sure that it honestly returns values without any tricks.

 ESphBinRead CSphBin::ReadBytes ( void * pDest, int iBytes ) { .... return BIN_READ_EOF; .... return BIN_READ_ERROR; .... return BIN_READ_OK; }

As you can see, all the values returned by the function are greater than or equal to 0. Now it’s the turn of the code with the error:

 static void DictReadEntry (....) { .... if ( pBin->ReadBytes ( pKeyword, iKeywordLen )<0 ) { assert ( pBin->IsError() ); return; } .... }

PVS-Studio warning: V547 Expression is always false. sphinx.cpp 22416

Such a check does not make sense. The condition is always false, and as a result, incorrect situations when reading data are not processed. Most likely, it should have been written here:

 if ( pBin->ReadBytes ( pKeyword, iKeywordLen ) != BIN_READ_OK)

This code demonstrates that the author of the code only thinks that the program will handle incorrect situations. In fact, I often encounter defects in code that is responsible for handling incorrect / non-standard situations. Therefore, programs often fall when something goes wrong. Error handlers are simply incorrectly written in them.

No mystery why this happens, of course not. Testing such parts of the program is difficult and uninteresting. This is one of those cases where a static analyzer can be a good helper, since it checks the code no matter how often it gets control.

CWE-14: Compiler Clear Buffers Code

 static bool GetFileStats (....) { .... struct_stat tStat; memset ( &tStat, 0, sizeof ( tStat ) ); if ( stat ( szFilename, &tStat ) < 0 ) { if ( pError ) *pError = strerror ( errno ); memset ( &tStat, 0, sizeof ( tStat ) ); // <= return false; } .... }

PVS-Studio warning: V597 The compiler could delete the memset function call, which is used to flush the tStat object. The memset_s () function should be used to erase the private data. sphinx.cpp 19987

The compiler has the right to remove a call to the memset function, which, in case of an error in the program, should clear the private data in tStat .

Why does the compiler do this, I have written many times, so I will not repeat. For those who have not yet encountered such situations, I suggest reading the description of the V597 diagnostic or see the description of the CWE-14 .

CWE-762: Mismatched Memory Management Routines

First we need to look at the implementation of two macros:

 #define SafeDelete(_x) \ { if (_x) { delete (_x); (_x) = nullptr; } } #define SafeDeleteArray(_x) \ { if (_x) { delete [] (_x); (_x) = nullptr; } }

Now, I think it’s easy for you to find the error in this code:

 int CSphIndex_VLN::DebugCheck ( FILE * fp ) { .... CSphRowitem * pInlineStorage = NULL; if ( pQword->m_iInlineAttrs ) pInlineStorage = new CSphRowitem [ pQword->m_iInlineAttrs ]; .... // cleanup SafeDelete ( pInlineStorage ); .... }

PVS-Studio warning: V611, it was allocated using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] pInlineStorage;'. sphinx.cpp 19178

As you can see, the memory is allocated for the array, and is released, as if only one element was created. Instead of the SafeDelete macro, the SafeDeleteArray macro should be used here .

Unique bugs

Above, I looked at a few errors that reveal themselves in both Sphinx and Manticore code. In this case, of course, there are errors inherent in only one project. Consider for example one such case.

In both projects there is a function RotateIndexMT . But it is implemented in different ways. In the implementation of the Sphinx project, this function contains a defect CWE-690 (Unchecked Return Value to NULL Pointer Dereference).

First, let's look at the declaration of the CheckServedEntry function:

 static bool CheckServedEntry(const ServedIndex_c * pEntry, // <= const char * sIndex, CSphString & sError );

The first argument is a pointer to a constant object. Therefore, the function cannot change this object and the pointer itself.

Now the function containing the error:

 static bool RotateIndexMT ( .... ) { .... ServedIndex_c * pServed = g_pLocalIndexes->GetWlockedEntry ( sIndex ); pServed->m_sNewPath = ""; // <= if ( !CheckServedEntry ( pServed, sIndex.cstr(), sError ) ) { if ( pServed ) // <= pServed->Unlock(); return false; } .... }

PVS-Studio warning: V595 The 'pServed' pointer was used before it was verified against nullptr. Check lines: 17334, 17337. searchd.cpp 17334

First, the pServed pointer is dereferenced. Next, the CheckServedEntry function is called , which, as we have already learned, cannot change the pServed pointer passed as the first actual argument.

Next, the pServed pointer is checked for NULL equality. Aha So the pointer could be null. Therefore, above, before the first pointer dereferencing, you need to add a check.

Another option: if (pServed) is an extra check if the pointer is never NULL . In any case, the code must be corrected.

Let's sum up

The Sphinx project is smaller in size to the Manticore project. At the same time, in the Sphinx project, I noticed more errors and a “wrap code” than in the Manticore project.

Given the size of the projects and the number of defects seen, I got the following result. Take the error density in Manticore for 1. Then, in my Sphinx project, the error density in my estimation is 1.5.

My findings. The density of errors in the Sphinx project is one and a half times higher than in the Manticore project. Consequently, the quality of the Manticore code is better than that of the Sphinx project. Fork turned out better than the original.

I repeat once again that this is my subjective opinion, based on a very small amount of information. The density of errors in the code of some components does not yet speak about the quality and reliability of the project as a whole.

Download and try the PVS-Studio analyzer. It's simple. In the end, even if you write the perfect code, you can always look for errors in the code of your colleagues :).

Thanks for attention. Subscribe to Twitter or RSS to keep abreast of our new publications.