📜 ⬆️ ⬇️

Compare the implementation of the languages ​​of Python and Ruby by the density of errors

Which programming language to start learning: Python or Ruby? Which one is better? Django or Ruby on Rails? Such questions can often be found in IT forums around the world. I propose to compare not the languages ​​themselves, but their reference implementations: CPython and MRI. About what errors in their interpreters could be found by PVS-Studio, and will be discussed in this article.




')

Introduction


The latest sources from the main repositories ( Ruby , Python ) were taken for analysis. The check was performed using the PVS-Studio static code analyzer version 6.06. Python is great at Visual Studio, and for Ruby you can use the Standalone version in compiler call monitoring mode.

There were not so many overt errors in the projects: most of the positives are related to the use of macros, which are revealed to extremely suspicious code from the point of view of the analyzer, but harmless from the point of view of the developer. You can debate for a long time on whether macros are evil or not, but you can say for sure that the analyzer does not like them. To get rid of some annoying macro, there is an option to suppress false positives. It is enough to write:
//-V:RB_TYPE_P:501 

And all the V501 diagnostic triggers, in which the RB_TYPE_P macro appears, will disappear.

Python




Fragment N1


 #ifdef MS_WINDOWS typedef SOCKET SOCKET_T; #else typedef int SOCKET_T; #endif typedef struct { PyObject_HEAD SOCKET_T sock_fd; /* Socket file descriptor */ .... } PySocketSockObject; static int internal_select(PySocketSockObject *s, int writing, _PyTime_t interval, int connect) { .... if (s->sock_fd < 0) // <= return 0; .... } 

V547 Expression 's-> sock_fd <0' is always false. Unsigned type value is never <0. socketmod.c 655

The Windows SOCKET type is unsigned, so comparing it with zero is meaningless. In order to check whether the socket () function returned a valid handle, you need to compare its value with INVALID_SOCKET . It is worth noting that in Linux this comparison would work correctly, since the signed type int is used as the socket type and the value -1 is signaled to the error. However, for testing, it is better to use special macros or constants.

Similar checks for which a warning was issued:

Fragment N2


 int ASN1_PRINTABLE_type(const unsigned char *s, int len) { int c; int ia5 = 0; .... if (!(((c >= 'a') && (c <= 'z')) || ((c >= 'A') && (c <= 'Z')) || (c == ' ') || // <= ((c >= '0') && (c <= '9')) || (c == ' ') || (c == '\'') || // <= (c == '(') || (c == ')') || (c == '+') || (c == ',') || (c == '-') || (c == '.') || (c == '/') || (c == ':') || (c == '=') || (c == '?'))) ia5 = 1; .... } 

V501 There are identical sub-expressions '(c ==' ')' to the left operator. a_print.c 77

A typical example of an error resulting from Copy-Paste. Often, with a large number of copied blocks, the programmer loses attention and forgets to change one variable or constant in one of them. So in this case, in a large conditional expression, the values ​​with which the variable c is compared are confused. It is impossible to say exactly, but it seems that the double quotation mark character "" "was forgotten.

Fragment N3


 static PyObject * semlock_acquire(SemLockObject *self, PyObject *args, PyObject *kwds) { .... HANDLE handles[2], sigint_event; .... /* prepare list of handles */ nhandles = 0; handles[nhandles++] = self->handle; if (_PyOS_IsMainThread()) { sigint_event = _PyOS_SigintEvent(); assert(sigint_event != NULL); handles[nhandles++] = sigint_event; } /* do the wait */ Py_BEGIN_ALLOW_THREADS if (sigint_event != NULL) //<= ResetEvent(sigint_event); .... } 

V614 Potentially uninitialized pointer 'sigint_event' used. semaphore.c 120

In the case where the _PyOS_IsMainThread () function returns false , the sigint_event pointer will remain uninitialized. This will lead to undefined behavior. This error can be easily overlooked in the debug version, where the pointer is likely to be initialized to zero.

Fragment N4


 #define BN_MASK2 (0xffffffffffffffffLL) int BN_mask_bits(BIGNUM *a, int n) { .... a->d[w] &= ~(BN_MASK2 << b); //<= .... } 

V610 Undefined behavior. Check the shift operator '<<'. The left operand '(0xffffffffffffffffffLL)' is negative. bn_lib.c 796

Despite the fact that the code works in most cases, this standard expression is an undefined behavior. You can read more about the shifts of negative numbers in the article by Andrei Karpov " Without knowing the ford, do not climb into the water. Part Three ". Should you avoid constructions, the result of which is not guaranteed by the standard, you decide, but it is better to refuse from such, what the analyzer warns us about.
 static PyObject * binascii_b2a_qp_impl(PyModuleDef *module, Py_buffer *data, int quotetabs, int istext, int header) { Py_ssize_t in, out; const unsigned char *databuf; .... if ((databuf[in] > 126) || (databuf[in] == '=') || (header && databuf[in] == '_') || ((databuf[in] == '.') && (linelen == 0) && (databuf[in+1] == '\n' || databuf[in+1] == '\r' || databuf[in+1] == 0)) || (!istext && ((databuf[in] == '\r') || (databuf[in] == '\n'))) || ((databuf[in] == '\t' || databuf[in] == ' ') && (in + 1 == datalen)) || ((databuf[in] < 33) && (databuf[in] != '\r') && (databuf[in] != '\n') && (quotetabs || (!quotetabs && ((databuf[in] != '\t') && // <= (databuf[in] != ' ')))))) { .... } .... } 

V728 Anonymous check can be simplified. The '||' quotetabs' and '! quotetabs' operator binascii.c 1453

This fragment is not erroneous, however, it is worth considering it separately. The warning is largely advisory in nature: the expression 'A || (! A && B) ' can and should be simplified to ' A || B ' : this will make it easier to read the already over complicated code.

Similar warnings:

Fragment N5


 static int dh_cms_set_peerkey(....) { .... int atype; .... /* Only absent parameters allowed in RFC XXXX */ if (atype != V_ASN1_UNDEF && atype == V_ASN1_NULL) goto err; .... } 

V590 Consider inspecting the 'atype! = - 1 && atype == 5' expression. The expression is misprint. dh_ameth.c 670

Oddly enough, errors in logical expressions are very often found even in large projects. The logical expression from this fragment is redundant: it can be simplified to ' atype == V_ASN1_NULL '. Most likely, based on the context, there is no error here, but such code looks suspicious.

Fragment N6


 static void cms_env_set_version(CMS_EnvelopedData *env) { .... if (env->originatorInfo || env->unprotectedAttrs) env->version = 2; env->version = 0; } 

V519 The 'env-> version' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 907, 908. cms_env.c 908

It is difficult to say what was originally meant by the author of this code. Possibly, else is missing here. Now there is no sense in if, since the value of the variable ' env-> version' will be overwritten in any case.

Fragment N7


 int _PyState_AddModule(PyObject* module, struct PyModuleDef* def) { PyInterpreterState *state; if (def->m_slots) { PyErr_SetString(PyExc_SystemError, "PyState_AddModule called on module with slots"); return -1; } state = GET_INTERP_STATE(); if (!def) return -1; .... } 

V595 The 'self-> extra' pointer was used before it was verified against nullptr. Check lines: 917, 923. _elementtree.c 917

The traditional mistake associated with dereferencing a null pointer, which occurs in almost every project. First, in the expression 'def-> m_slots', they were addressed to some address, and then it turned out that this address could be zero. As a result, the check for nullptr does not work, since the null pointer dereference will occur, which will lead to undefined program behavior, for example, to its fall.

Ruby




Fragment N1


 static void vm_set_main_stack(rb_thread_t *th, const rb_iseq_t *iseq) { VALUE toplevel_binding = rb_const_get(rb_cObject, rb_intern("TOPLEVEL_BINDING")); rb_binding_t *bind; rb_env_t *env; GetBindingPtr(toplevel_binding, bind); GetEnvPtr(bind->env, env); vm_set_eval_stack(th, iseq, 0, &env->block); /* save binding */ if (bind && iseq->body->local_size > 0) { bind->env = vm_make_env_object(th, th->cfp); } } 

V595 The 'bind' pointer was used before it was verified against nullptr. Check lines: 377, 382. vm.c 377

Not escaped a similar error in the Ruby project. The 'if (bind)' check will not save from trouble, since bind was de-referenced above by code. In total, there were more than 30 such warnings in Ruby; it makes no sense to bring them all.

Fragment N2


 static int code_page_i(....) { table = realloc(table, count * sizeof(*table)); if (!table) return ST_CONTINUE; .... } 

V701 realloc () possible leak: when realloc () fails in allocating memory, the original pointer 'table' is lost. Consider assigning realloc () to a temporary pointer. file.c 169

In this section of the code, the value of realloc is stored in the same variable that is used as the argument. In the case where realloc returns nullptr , the initial value of the pointer will be lost, leading to a memory leak.

Fragment N3


 static int w32_symlink(UINT cp, const char *src, const char *link) { .... BOOLEAN ret; typedef DWORD (WINAPI *create_symbolic_link_func) (WCHAR*, WCHAR*, DWORD); static create_symbolic_link_func create_symbolic_link = (create_symbolic_link_func)-1; .... ret = create_symbolic_link(wlink, wsrc, flag); ALLOCV_END(buf); if (!ret) { int e = GetLastError(); errno = map_errno(e); return -1; } return 0; } 

V724 Converting type 'DWORD' to type 'BOOLEAN' can lead to a loss of high-order bits. Non-zero value can become 'FALSE'. win32.c 4974

The BOOLEAN type is used in WinAPI as a logical type. It is declared as follows:
 typedef unsigned char BYTE; typedef BYTE BOOLEAN; 

DWORD is a 32-bit unsigned number. Therefore, if we bring, for example, the DWORD value 0xffffff00 to BOOLEAN (or any other one whose low-order bit is zero), then it will become 0, that is, FALSE.

Fragment N4


 static VALUE rb_str_split_m(int argc, VALUE *argv, VALUE str) { .... char *ptr = RSTRING_PTR(str); long len = RSTRING_LEN(str); long start = beg; .... if (ptr+start == ptr+len) start++; .... } 

V584 The 'ptr' value is the operator. The expression is not correct. string.c 7211

In both parts of the comparison, the addition of ptr is present, therefore, it can be omitted:
 if (start == len) 

Most likely in this fragment there is no error. However, often in such expressions actually want to compare two different variables. Therefore, it is better to pay attention to such comparisons.

Results


After analyzing all the issued general-purpose diagnostic warnings and removing all false positives, I arrived at the following error distribution:



Most of the Ruby alarms fell on the V610 warning (369 alarms!), But even if they are excluded, the situation will not change: the number of suspicious places is still less in Python.

V595 diagnostics turned out to be the most frequent: it met 17 times in Python code, and 37 times in Ruby code.

But the most interesting is the density error ratio. For this characteristic, Python also differs for the better. The results of the calculations can be found in the table:



It may seem like a bit of mistakes. This is not true. First, not all of the errors found are critical. For example, the previously mentioned V610 diagnostics detect errors from the point of view of the C ++ language. However, in practice, for the set of compilers used, the result can always be correct. Although this error does not cease to be errors, they in no way affect the work of the program. Secondly, you need to consider the size of the verified code. Therefore, we can talk about the high quality of these projects. So far, this assessment is subjective, since previously we did not calculate the density of errors in proven projects. But we will try to do it in the future, so that later comparisons can be made.

Conclusion




Python and Ruby are very popular: they are written by millions of developers from around the world. With such projects and community activity, good code quality, excellent testing and the use of static analysis (both projects are checked using Coverity) it is difficult to find a large number of errors. However, PVS-Studio managed to find several suspicious places. But it should be understood that it is precisely regular code checking that can seriously ease the life of developers. It is best to correct the error immediately before the changes get into the repository and release - a static analyzer can help with this, like no other.

I suggest everyone to try PVS-Studio on their projects.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Pavel Belikov. Python and Ruby implementations compared by the error density .

Read the article and have a question?
Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.


Update. This article contains inaccuracies. Please review the " Clarification on CPython and Ruby Project Verification ".

Source: https://habr.com/ru/post/306204/


All Articles