Comparison of Firebird, MySQL and PostgreSQL code quality

Today’s article is somewhat unusual. At least for the reason that instead of analyzing one project, we will look for errors in three at once, and also see where the most interesting bugs are. And the most interesting is that we will find out who the young man is and write the highest quality code. So, on the agenda - analysis of errors in the code of projects Firebird, MySQL and PostgreSQL.

In a nutshell about projects

Firebird

Firebird (FirebirdSQL) is a cross-platform database management system (DBMS) running on Mac OS X, Linux, Microsoft Windows and a variety of Unix platforms.

Firebird has been used in various industrial systems (warehouse and business, financial, and public sectors) since 2001. This is a commercially independent project of C and C ++ programmers, technical advisors.
')
Additional information:

official website ;
GitHub repository
the number of stars on GitHub is 133;
the number of forks on GitHub is 51.

Mysql

MySQL is a free relational database management system. Typically, MySQL is used as a server that is accessed by local or remote clients, but the distribution package includes an internal server library that allows you to include MySQL in standalone programs.

The flexibility of MySQL DBMS is supported by the support of a large number of table types: users can select both MyISAM type tables that support full-text search and InnoDB tables that support transactions at the level of individual records. Moreover, MySQL DBMS comes with a special type of table EXAMPLE, demonstrating the principles of creating new types of tables. Due to the open architecture and GPL-licensing, new types of tables are constantly appearing in MySQL.

Additional information :

official website ;
GitHub repository
the number of stars on GitHub is 2179;
The number of forks on GitHub is 907.

PostgreSQL

PostgreSQL is a free object-relational database management system (DBMS).

There are implementations for many UNIX-like platforms, including AIX, various BSD systems, HP-UX, IRIX, Linux, macOS, Solaris / OpenSolaris, Tru64, QNX, and also for Microsoft Windows. It can handle various volumes of data, ranging from small personal applications to large Internet applications (data warehouses) with many concurrent users.

PostgreSQL is based on Postgres, a non-profit DBMS developed as an open-source project at the University of California at Berkeley.

Additional information :

official website ;
"Mirror" repository on GitHub ;
the number of stars on GitHub is 3260;
the number of forks on GitHub is 1107.

PVS-Studio

The PVS-Studio static code analyzer was used as an error search tool. PVS-Studio is a source code analyzer for programming languages C, C ++, C #, which helps reduce the cost of software development by early detection of errors, defects and potential vulnerabilities in the program code. Works in Windows and Linux environments.

Download links:

Windows version .
Linux version .

Due to the fact that all 3 projects are simply collected and contain .sln files (immediately, or after generation via CMake), the task of the analysis itself becomes quite trivial - it is enough to run the test through the PVS-Studio plugin interface embedded in IDE Visual Studio.

Comparison criteria

Before looking at what interesting was found in the projects, it is necessary to determine one of the main issues of the article - by what criteria to perform the comparison.

Why is the 'direct' comparison not the best idea?

Comparing head-on by the number of warnings issued by the analyzer (or rather, by the number of warnings to the number of lines of code) is not the best idea, even though this is the least labor-intensive way. Why? Take for example the PostgreSQL project, where the number of GA warnings with a high confidence level is 611. If you set filtering by the diagnostic rule code ( V547 ) and message parts ( ret <0 ) in the PVS-Studio window, you can see that there are 419 such warnings! Too much ... This immediately suggests that somewhere there is a source of these warnings, for example, a macro, or the code is automatically generated. Comments at the beginning of the files containing these warnings confirm the correctness of the theory:

/* This file was generated automatically by the Snowball to ANSI C compiler */

Now, knowing that the code was automatically generated, there are 2 ways:

Suppress all warnings because the auto-generated code is not interesting. Thus, the number of warnings (GA, Lvl1) is immediately reduced by 69%!
Accept that errors even in auto-generated code are still errors, and try to do something about it (fix the code generation script, for example). In this case, the number of warnings remains relevant.

Another pitfall is errors in third-party components used in projects. Yet again:

We can say that such errors are not your headache. Would users agree with this statement?
Take responsibility.

These are just a couple of examples that can put forward the problem of choice, the solution of which can change (sometimes significantly) the number of actual warnings.

Another way

We will immediately agree that we will not consider warnings of the 3rd confidence level (low certainty). These are not the things you should pay attention to first. Undoubtedly, there may be something useful, but when writing articles and when starting to use static analysis, it makes sense to ignore level 3 warnings.

I will not perform a full-scale comparison, as this work is very laborious for many reasons. To take at least a preliminary adjustment of the analyzer for each project, viewing and analyzing hundreds of warnings - all this is very long, and what will be the efficiency - an open question.

Therefore, we proceed in another way. I will look at the logs of all three projects, try to find some of the most interesting errors and analyze them, looking at the same time if there is something similar in other projects.

In addition, relatively recently, we began to glance towards the search for security issues. Even the article was on this topic - " How can PVS-Studio help in finding vulnerabilities? ". Given that one of the participants in this review, MySQL, got into the article mentioned above, I was interested to check whether it would be possible to find a similar move. No tricks - just further see the PVS-Studio warnings, similar to those in the article about vulnerabilities.

Summarizing the above, I will evaluate the quality of the code according to the following criteria:

I will take the analyzer warning numbers from the aforementioned vulnerability article and will look for similar warnings in all three projects. I think this approach is understandable - since we know that such code may be a vulnerability (although not always), you should pay special attention to it.
I look at the GA warnings of the analyzer of the first two confidence levels, select the ones that seem most interesting, and then check to see if there is something similar in other projects.

According to the results of these checks, we will record penalty points in the piggy bank project. Accordingly, whoever scores less points has the best code in relation to the approaches described above. Of course, there are nuances, but I will describe them as we analyze and at the time of summing up.

Well, let's get started!

Debriefing

General analysis results

The table below shows the general results of the analysis of projects taken “as is” - without suppressing false warnings, filtering by directories, etc. Please note that these are only general purpose warnings (General analysis).

Project	High certainty	Medium Certainty	Low certainty	Total
Firebird	156	680	1045	1881
Mysql	902	1448	2925	5275
PostgreSQL	611	1432	1576	3619

However, one should not judge the quality of the code on this table. I mentioned reasons above, I repeat:

lack of preliminary setup of the analyzer;
false positives are not suppressed;
different size of the code base;
during the writing of the article, changes were made to the analyzer, so that at the time of the beginning and end of the writing of the article, the results in this table may differ.

As for the density of warnings (not errors!), Taken without first setting up the analyzer, that is, the ratio of the number of warnings to LOC - it is approximately equal to Firebird and PostgreSQL, and slightly higher at MySQL. But let's not make hasty conclusions, because, as you know, the devil is in the details.

Problems mashing private data

The V597 warning signals the presence of such a call to the memset function that performs data cleansing , which can be removed by the compiler during optimization. Because of this, private data may remain uncleaned. The problem is described in more detail in the documentation for the diagnostic rule .

Neither Firebird nor PostgreSQL showed any such warning, which cannot be said about MySQL. Look at the suspicious code of this project.

 extern "C" char * my_crypt_genhash(char *ctbuffer, size_t ctbufflen, const char *plaintext, size_t plaintext_len, const char *switchsalt, const char **params) { int salt_len; size_t i; char *salt; unsigned char A[DIGEST_LEN]; unsigned char B[DIGEST_LEN]; unsigned char DP[DIGEST_LEN]; unsigned char DS[DIGEST_LEN]; .... (void) memset(A, 0, sizeof (A)); (void) memset(B, 0, sizeof (B)); (void) memset(DP, 0, sizeof (DP)); (void) memset(DS, 0, sizeof (DS)); return (ctbuffer); }

PVS-Studio warnings :

V597 The compiler couldn’t delete the memset function call, which is used to flush 'A' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_genhash_impl.cc 420
V597 The compiler couldn’t delete the memset function call, which is used to flush 'B' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_genhash_impl.cc 421
V597 The compiler couldn’t delete the memset function call, which is used to flush 'DP' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_genhash_impl.cc 422
V597 The compiler couldn’t delete the memset function call, which is used to flush 'DS' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_genhash_impl.cc 423

The analyzer immediately found 4 buffers in one function (!) For which forced data cleansing should be performed, and which, at the same time, may not occur. Then nullable (in theory) data will remain in memory in the form of “as is”. The lack of further use of buffers A , B , DP , DS allows the compiler to remove the memset function call, since such a change does not affect the behavior of the program from the point of view of C / C ++. This problem is described in more detail in the article " Safe cleaning of private data ".

Other warnings are similar, so I will not analyze them. I will give them a list:

V597 The compiler could delete the memset function call, which is used to flush the table_list object. The RtlSecureZeroMemory () function should be used to erase the private data. sql_show.cc 630
V597 The compiler couldn’t delete the memset function call, which is used to flush 'W' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha.cpp 413
V597 The compiler couldn’t delete the memset function call, which is used to flush 'W' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha.cpp 490
V597 The compiler couldn’t delete the memset function call, which is used to flush 'T' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha.cpp 491
V597 The compiler couldn’t delete the memset function call, which is used to flush 'W' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha.cpp 597
V597 The compiler couldn’t delete the memset function call, which is used to flush 'T' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha.cpp 598

But a slightly more interesting case.

 void win32_dealloc(struct event_base *_base, void *arg) { struct win32op *win32op = arg; .... memset(win32op, 0, sizeof(win32op)); free(win32op); }

PVS-Studio warning : V597 The compiler could delete the memset function call, which is used to flush the win32op object. The RtlSecureZeroMemory () function should be used to erase the private data. win32.c 442

Here the situation is similar, but after resetting the data in memory, the corresponding pointer is passed to the free function. Despite this, the compiler can still remove the memset call, leaving only the call to the function to free the memory block ( free ). As a result, the data that should be reset can remain in memory. More information can be found in the above mentioned article.

Scoring. Quite a serious mistake, there is more, not found in a single copy. 3 MySQL penalty points.

Failure to check the validity of the pointer returned by malloc and similar functions.

V769 warnings were issued for all three projects.

Firebird: high certainty - 0; medium certainty - 0; low certainty - 9;
MySQL: high certainty - 0; medium certainty - 13; low certainty - 103;
PostgreSQL: high certainty - 1 medium certainty - 2; low certainty - 24.

Since we agreed not to consider the third level, Firebird is immediately eliminated (in a good way) from comparison. All 3 warnings on PostgreSQL code also turned out to be irrelevant. But with MySQL, everything is not so clear. There were false positives there as well, but some warnings are quite interesting.

 bool Gcs_message_stage_lz4::apply(Gcs_packet &packet) { .... unsigned char *new_buffer = (unsigned char*) malloc(new_capacity); unsigned char *new_payload_ptr = new_buffer + fixed_header_len + hd_len; // compress payload compressed_len= LZ4_compress_default((const char*)packet.get_payload(), (char*)new_payload_ptr, static_cast<int>(old_payload_len), compress_bound); .... }

PVS-Studio warning : V769 The 'new_buffer' pointer in the 'new_buffer + fixed_header_len' expression could be nullptr. In such a case, it will not be used. Check lines: 74, 73. gcs_message_stage_lz4.cc 74

The malloc function, if it failed to return the requested memory block, returns a null pointer, which can be written to the variable new_buffer . Further, when initializing the value of the variable new_payload_ptr , the value of the pointer new_buffer is summed with the values of the variables fixed_header_len and hd_len . That's all, the point of no return for the pointer new_payload_ptr : if somewhere further (for example, in another function) we want to check the pointer's validity by comparing it with NULL , such a test will not help. You can judge the consequences yourself. Therefore, before initializing new_payload_ptr, you should check that new_buffer is not a null pointer.

Someone may argue - why check the return value of malloc to NULL if we could not get the necessary block of memory? All the same, further normal work is impossible, therefore, let the application fall when further working with this pointer, for example.

Due to the fact that some developers adhere to this position, it has a right to exist, but how correct is this approach? After all, you can try to somehow handle this situation in order, for example, not to lose data, or to 'fall more gently'. In addition, such code becomes potentially vulnerable, since if the operation does not occur directly with a null pointer, but with another block of memory ( null pointer + value ), the application may well damage some data. Moreover, all this is another way to add vulnerability to the application. Do you need it? Pros, cons and final decision, I think everyone will make for himself.

I advise you to stick with the second approach, and the V769 diagnostic rule will help you in detecting such situations.

If you decide that such functions can never return NULL , you can inform the analyzer of this in order not to receive the corresponding warnings. For information on how to do this, see the article " Advanced diagnostics ".

Scoring. Given the above, MySQL gets 1 penalty point.

Using a potentially null pointer

V575 warnings were issued for all 3 projects.

An example of an error from the Firebird project (medium certainty):

 static void write_log(int log_action, const char* buff) { .... log_info* tmp = static_cast<log_info*>(malloc(sizeof(log_info))); memset(tmp, 0, sizeof(log_info)); .... }

PVS-Studio warning : V575 The potential null pointer is passed to 'memset' function. Inspect the first argument. Check lines: 1106, 1105. iscguard.cpp 1106

The problem is similar to the one mentioned above - the return value of the malloc function is not checked. If the requested amount of memory could not be allocated, malloc will return a null pointer, which will then be passed to the memset function.

Similar code from the MySQL project:

 Xcom_member_state::Xcom_member_state(....) { .... m_data_size= data_size; m_data= static_cast<uchar *>(malloc(sizeof(uchar) * m_data_size)); memcpy(m_data, data, m_data_size); .... }

PVS-Studio warning : V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 43, 42. gcs_xcom_state_exchange.cc 43

The error is similar to the problem described above from Firebird. Just in case, I remind you that there are places where the return value of malloc is checked for NULL inequality. But this does not apply to them.

Similar code was also found in PostgreSQL:

 static void ecpg_filter(const char *sourcefile, const char *outfile) { .... n = (char *) malloc(plen); StrNCpy(n, p + 1, plen); .... }

PVS-Studio warning : V575 The potential null pointer is passed into the 'strncpy' function. Inspect the first argument. Check lines: 66, 65. pg_regress_ecpg.c 66

There were, however, more interesting high-certainty level warnings on MySQL and PostgreSQL projects.

MySQL snippet:

 View_change_event::View_change_event(char* raw_view_id) : Binary_log_event(VIEW_CHANGE_EVENT), view_id(), seq_number(0), certification_info() { memcpy(view_id, raw_view_id, strlen(raw_view_id)); }

PVS-Studio warning : V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. control_events.cpp 830

Using the memcpy function, copy the string from raw_view_id to view_id , the number of bytes copied is calculated using the strlen function. The caveat is that strlen returns the length of the string without taking into account the terminal zero, therefore, it will not be copied. It should be borne in mind that now, if you don’t add the terminal zero yourself, the functions of working with strings will incorrectly work with view_id . To copy strings correctly, you would use the strcpy / strcpy_s functions .

It would seem that similar code from PostgreSQL:

 static int PerformRadiusTransaction(char *server, char *secret, char *portstr, char *identifier, char *user_name, char *passwd) { .... uint8 *cryptvector; .... cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH); memcpy(cryptvector, secret, strlen(secret)); }

PVS-Studio warning : V575 The 'memcpy' function doesn’t copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. auth.c 2956

There is an interesting difference from the previous case. The type of variable cryptvector is uint8 * . Despite the fact that uint8 is a pseudonym for the unsigned char , it seems to me that the explicit intention is to show that it will not work with data as with a string. Therefore, in this context, such an operation is permissible and not alarming as the previous one.

I met, however, and the code that seemed less secure.

 int intoasc(interval * i, char *str) { char *tmp; errno = 0; tmp = PGTYPESinterval_to_asc(i); if (!tmp) return -errno; memcpy(str, tmp, strlen(tmp)); free(tmp); return 0; }

PVS-Studio warning : V575 The 'memcpy' function doesn’t copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. informix.c 677

The situation is similar to that described above, but closer to the code from MySQL — strings are used, the contents of which (with the exception of terminal zero) are copied into memory used somewhere outside ...

Scoring. Firebird - 1 penalty point, PostgreSQL and MySQL - 3 penalty points, (1 - warnings of medium confidence level, 2 - for high confidence level).

Potentially dangerous use of formatting functions

V618 warnings were issued only to the code in the Firebird project.

Consider an example:

 static const char* const USAGE_COMP = " USAGE IS COMP"; static void gen_based( const act* action) { .... fprintf(gpreGlob.out_file, USAGE_COMP); .... }

PVS-Studio warning : V618 It’s dangerous to call the format. The example of the safe code: printf ("% s", str); cob.cpp 1020

The analyzer was alarmed by the fact that the formatted output function ( fprintf ) is used, but the string is printed directly, without using the format string with the corresponding specifiers. This can be dangerous, and even cause a vulnerability (see CVE-2013-4258 ) in cases where the format specifiers are found in the printed line. Here, the USAGE_COMP string is explicitly defined in the source code and does not contain format specifiers, so such use can be considered valid.

In other places the situation is similar: the printed lines were hard-coded and did not contain format specifiers.

Scoring. Due to the fact that it is written above, I decided not to “fine” Firebird.

Suspicious use of enumeration items

Sample code from MySQL.

 enum wkbType { wkb_invalid_type= 0, wkb_first= 1, wkb_point= 1, wkb_linestring= 2, wkb_polygon= 3, wkb_multipoint= 4, wkb_multilinestring= 5, wkb_multipolygon= 6, wkb_geometrycollection= 7, wkb_polygon_inner_rings= 31, wkb_last=31 }; bool append_geometry(....) { .... if (header.wkb_type == Geometry::wkb_multipoint) .... else if (header.wkb_type == Geometry::wkb_multipolygon) .... else if (Geometry::wkb_multilinestring) .... else DBUG_ASSERT(false); .... }

PVS-Studio warning : V768 The enumeration constant 'wkb_multilinestring' is used as a variable of a Boolean-type. item_geofunc.cc 1887

In principle, the warning text speaks for itself. If you look at conditional expressions, you can see that two are comparisons of header.wkb_type with elements of the Geomerty enumeration, and the third conditional expression is the element of the enumeration. Since Geometry :: wkb_multilinestring is set to 5 , the body of this conditional statement will always be executed when the two previous checks fail. Thus, the else branch containing the DBUG_ASSERT macro will never be executed. Obviously, the correct form of the third conditional expression is the following:

 header.wkb_type == Geometry::wkb_multilinestring

What about the other projects? In PostgreSQL, there were no such warnings, but in Firebird - as many as 9. True, these warnings are already at a lower level (medium certainty), and the pattern detected is also different.

The search patterns for errors detected by diagnostic rule V768 are as follows:

High certainty: use enumeration members as logical type expressions.
Medium certainty: use of variables that have an enumeration type, as boolean expressions.

Consequently, if in the first case it does not work out, then you can somehow argue with the warnings of the analyzer at the second level of reliability.

For example, most cases are something like this:

 enum att_type { att_end = 0, .... }; void fix_exception(...., att_type& failed_attrib, ....) { .... if (!failed_attrib) .... }

PVS-Studio warning : V768 The variable 'failed_attrib' is of enum type. It is odd that it is used as a variable of a Boolean-type. restore.cpp 8580

The analyzer considered a suspicious code, which thus checks that the variable failed_attrib is att_type :: att_end . I would, for example, prefer an explicit comparison with an enumeration item. However, I cannot say that this code is wrong. Yes, I do not like this style (and the analyzer, too), but the code is valid.

But there are 2 places that look more suspicious. The pattern is the same, so consider only one case.

 namespace EDS { .... enum TraScope {traAutonomous = 1, traCommon, traTwoPhase}; .... } class ExecStatementNode : .... { .... EDS::TraScope traScope; .... }; void ExecStatementNode::genBlr(DsqlCompilerScratch* dsqlScratch) { .... if (traScope) .... .... }

PVS-Studio warning : V768 The variable 'traScope' is of enum type. It is odd that it is used as a variable of a Boolean-type. stmtnodes.cpp 3448

The code is similar to the previous one - they also wanted to check that the traScope variable contains the value of the enumeration element with the actual non-zero value. But here, unlike the previous example, there are no enumeration elements with the actual value '0'. So this code looks more suspicious than the previous one.

, , MySQL — 10 .

. Firebird 1 , MySQL — 2.

, . , , .

 struct win32op { int fd_setsz; struct win_fd_set *readset_in; struct win_fd_set *writeset_in; struct win_fd_set *readset_out; struct win_fd_set *writeset_out; struct win_fd_set *exset_out; RB_HEAD(event_map, event_entry) event_root; unsigned signals_are_broken : 1; }; void win32_dealloc(struct event_base *_base, void *arg) { struct win32op *win32op = arg; .... memset(win32op, 0, sizeof(win32op)); free(win32op); }

PVS-Studio : V579 The memset function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. win32.c 442

memset . sizeof , , , sizeof , .

- , , memset , '' .

— , . , . , V501 C/C++ V3001 C# .

V579 .

. 2 MySQL.

. MySQL.

 typedef char Error_message_buf[1024]; const char* get_last_error_message(Error_message_buf buf) { int error= GetLastError(); buf[0]= '\0'; FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM, NULL, error, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)buf, sizeof(buf), NULL ); return buf; }

PVS-Studio : V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (buf)' expression. common.cc 507

Error_message_buf — 1024 char . —

 const char* get_last_error_message(char buf[1024])

buf — , , — . , sizeof(buf) , . — 4 8 1024.

Firebird PostgreSQL .

. 2 MySQL.

throw

, … MySQL. , .

 mysqlx::XProtocol* active() { if (!active_connection) std::runtime_error("no active session"); return active_connection.get(); }

PVS-Studio : V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); mysqlxtest.cc 509

std::runtime_error , . , , throw . — ( active_connection == nullptr ) .

Firebird, PostgreSQL .

. 2 MySQL.

Firebird.

 class Message { .... void createBuffer(Firebird::IMessageMetadata* aMeta) { unsigned l = aMeta->getMessageLength(&statusWrapper); check(&statusWrapper); buffer = new unsigned char[l]; } .... ~Message() { delete buffer; .... } ..... unsigned char* buffer; .... };

PVS-Studio : V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] buffer;'. Check lines: 101, 237. message.h 101

( buffer — Message ) — createBuffer , , , new[] . delete delete[] .

MySQL PostgreSQL .

. 2 Firebird.

Summing up

, :

Firebird: 1 + 1 + 2 = 4 .
MySQL: 3 + 1 + 2 + 2 + 2 + 2 = 12 .
PostgreSQL: 3 .

, , . ( ) … MySQL! , — , !

Firebird PostgreSQL . — — , — , , V768 … — PostgreSQL , 4 …

, 'i', Firebird PostgreSQL, , , , , . , - , …

, :

1 — Firebird PostgreSQL.
2 — MySQL.

, , , — , - ( , , Firebird PostgreSQL, MySQL).

? , . , - ? PVS-Studio ! ? ;)

, : Sergey Vasiliev. Code Quality Comparison of Firebird, MySQL, and PostgreSQL

Read the article and have a question?

Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/343410/

All Articles