📜 ⬆️ ⬇️

PHP verification note



PHP is a general purpose scripting language intensively used for developing web applications. Currently supported by the overwhelming majority of hosting providers and is one of the leaders among programming languages ​​used to create dynamic websites.

In the case of compilers and interpreters, the source code and testing, as a rule, are subject to increased requirements for quality and reliability. However, the PHP interpreter source code found some suspicious places.
')
This article will examine the results of checking the PHP interpreter, obtained with PVS-Studio 5.18.



Same conditional expression


V501 There are identical sub-expressions '! Memcmp ("auto", charset_hint, 4)' operator. html.c 396
static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC) { .... if ((len == 4) /* sizeof (none|auto|pass) */ && //<== (!memcmp("pass", charset_hint, 4) || !memcmp("auto", charset_hint, 4) || //<== !memcmp("auto", charset_hint, 4))) //<== { charset_hint = NULL; len = 0; } .... } 

The conditional expression contains a call to the 'memcmp' functions with the same parameters. The comment / * sizeof (none | auto | pass) * / tells us that the value “none” must be passed to one of the functions.

Always a false condition


V605 Consider verifying the expression: shell_wrote> - 1. An unsigned value is compared to the number -1. php_cli.c 266
 PHP_CLI_API size_t sapi_cli_single_write(....) { .... size_t shell_wrote; shell_wrote = cli_shell_callbacks.cli_shell_write(....); if (shell_wrote > -1) { //<== return shell_wrote; } .... } 

This comparison is an obvious mistake. '-1' is converted to the maximum value of the 'size_t' type, so the condition will always be false, and the entire check has gone to no. Perhaps the variable 'shell_wrote' used to have a signed type, but after refactoring, they did not take into account the features of operations with unsigned types.

Incorrect condition


V547 Expression 'tmp_len> = 0' is always true. Unsigned type value is always> = 0. ftp_fopen_wrapper.c 639
 static size_t php_ftp_dirstream_read(....) { size_t tmp_len; .... /* Trim off trailing whitespace characters */ tmp_len--; while (tmp_len >= 0 && //<== (ent->d_name[tmp_len] == '\n' || ent->d_name[tmp_len] == '\r' || ent->d_name[tmp_len] == '\t' || ent->d_name[tmp_len] == ' ')) { ent->d_name[tmp_len--] = '\0'; } .... } 

The 'size_t' type, being unsigned, allows indexing the maximum number of array elements for the current bitness of the application. Check (tmp_len> = 0) is incorrect. In the worst case, the decrement can lead to index overflow and memory access outside the array. Most likely, the code is executed correctly due to additional conditions and correct source data, but there is a danger of "looping" or going beyond the array.

Unsigned Difference


V555 The expression 'out_buf_size - ocnt> 0' will work as 'out_buf_size! = Ocnt'. filters.c 1702
 static int strfilter_convert_append_bucket( { size_t out_buf_size; .... size_t ocnt, icnt, tcnt; .... if (out_buf_size - ocnt > 0) { //<== .... php_stream_bucket_append(buckets_out, new_bucket TSRMLS_CC); } else { pefree(out_buf, persistent); } .... } 

Perhaps the 'else' branch will perform less frequently than it should, because the difference between unsigned numbers will almost always be greater than zero. The exception will be the equality of the operands: in this case, the condition is better to rewrite to a more informative version.

Pointer dereference


V595 The 'function_name' pointer was used before it was verified against nullptr. Check lines: 4859, 4860. basic_functions.c 4859
 static int user_shutdown_function_call(zval *zv TSRMLS_DC) { .... php_error(E_WARNING, "....", function_name->val); //<== if (function_name) { //<== STR_RELEASE(function_name); } .... } 

Checking the pointer after dereference always looks suspicious. In case of a real error, it can lead to the crash of the program.

Similar place:

Insidious optimization


V597 The compiler couldn’t delete the memset function call, which is used to flush the final buffer. The RtlSecureZeroMemory () function should be used to erase the private data. php_crypt_r.c 421
 /* * MD5 password encryption. */ char* php_md5_crypt_r(const char *pw,const char *salt, char *out) { static char passwd[MD5_HASH_MAX_LEN], *p; unsigned char final[16]; .... /* Don't leave anything around in vm they could use. */ memset(final, 0, sizeof(final)); //<== return (passwd); } 

The array 'final' may contain private information about the password, which is then reset, but the call to the function 'memset' will be deleted by the compiler. Details of why this can happen and why it is dangerous can be found in the article “ Overwrite Memory - Why? ” And diagnostic descriptions for V597 .

Similar places:

Can we trust the libraries used?


It is worth noting that third-party libraries make a great contribution to the development of the project, allowing the re-use of already implemented algorithms, but their quality also needs to be monitored, as well as the main project. I will give just a few examples from some libraries in order to follow the topic of the article and simply raise the issue of trust in third-party libraries.

The PHP interpreter uses many libraries, some of which are slightly rewritten for themselves.

libsqlite


V579 The sqlite3_result_blob It is possibly a mistake. Inspect the third argument. sqlite3.c 82631
 static void statInit(....) { Stat4Accum *p; .... sqlite3_result_blob(context, p, sizeof(p), stat4Destructor); .... } 

Most likely, they wanted to get the size of the object, not the pointer. It should have written sizeof (* p).

pcrelib


V501 There are identical sub-expressions '(1 << ucp_gbL)' operator. pcre_tables.c 161
 const pcre_uint32 PRIV(ucp_gbtable[]) = { (1<<ucp_gbLF), 0, 0, .... (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)| //<== (1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT), //<== (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)| (1<<ucp_gbT), .... }; 

In the expression for computing one element of the array there is a repeating one (1 << ucp_gbL). Judging by the code below, one of the variables ucp_gbL could well be called ucp_gbT, or simply superfluous.

PDO


V595 The 'dbh' pointer was used before it was verified against nullptr. Check lines: 103, 110. pdo_dbh.c 103
 PDO_API void pdo_handle_error(pdo_dbh_t *dbh, ....) { pdo_error_type *pdo_err = &dbh->error_code; //<== .... if (dbh == NULL || dbh->error_mode == PDO_ERRMODE_SILENT) { return; } .... } 

Here, at the very beginning of the function, the dereference of the incoming pointer is performed, and then it is checked for validity.

libmagic


V519 The '* code' variable is assigned twice twice successively. Perhaps this is a mistake. Check lines: 100, 101. encoding.c 101
 protected int file_encoding(...., const char **code, ....) { if (looks_ascii(buf, nbytes, *ubuf, ulen)) { .... } else if (looks_utf8_with_BOM(buf, nbytes, *ubuf, ulen) > 0) { DPRINTF(("utf8/bom %" SIZE_T_FORMAT "u\n", *ulen)); *code = "UTF-8 Unicode (with BOM)"; *code_mime = "utf-8"; } else if (file_looks_utf8(buf, nbytes, *ubuf, ulen) > 1) { DPRINTF(("utf8 %" SIZE_T_FORMAT "u\n", *ulen)); *code = "UTF-8 Unicode (with BOM)"; //<== *code = "UTF-8 Unicode"; //<== *code_mime = "utf-8"; } else if (....) { .... } } 

Two times set the encoding to a variable, one line is superfluous and, possibly, leads to the wrong behavior of the program further.

Conclusion


Despite the fact that PHP has existed for a long time and is popular, there are still suspicious places in its code and used libraries, although this kind of project is probably checked by various analyzers.

Using static analysis regularly, you can save a lot of time on solving more useful tasks.

This article is in English.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. A Post About Analyzing PHP .

Read the article and have a question?
Often our articles are asked the same questions. We collected the answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please review the list.

Source: https://habr.com/ru/post/235189/


All Articles