
PHP is a general purpose scripting language intensively used for developing web applications. Currently supported by the overwhelming majority of hosting providers and is one of the leaders among programming languages ​​used to create dynamic websites.
In the case of compilers and interpreters, the source code and testing, as a rule, are subject to increased requirements for quality and reliability. However, the PHP interpreter source code found some suspicious places.
')
This article will examine the results of checking the
PHP interpreter, obtained with
PVS-Studio 5.18.
Same conditional expression
V501 There are identical sub-expressions '! Memcmp ("auto", charset_hint, 4)' operator. html.c 396
static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC) { .... if ((len == 4) &&
The conditional expression contains a call to the 'memcmp' functions with the same parameters. The comment / * sizeof (none | auto | pass) * / tells us that the value “none” must be passed to one of the functions.
Always a false condition
V605 Consider verifying the expression: shell_wrote> - 1. An unsigned value is compared to the number -1. php_cli.c 266
PHP_CLI_API size_t sapi_cli_single_write(....) { .... size_t shell_wrote; shell_wrote = cli_shell_callbacks.cli_shell_write(....); if (shell_wrote > -1) {
This comparison is an obvious mistake. '-1' is converted to the maximum value of the 'size_t' type, so the condition will always be false, and the entire check has gone to no. Perhaps the variable 'shell_wrote' used to have a signed type, but after refactoring, they did not take into account the features of operations with unsigned types.
Incorrect condition
V547 Expression 'tmp_len> = 0' is always true. Unsigned type value is always> = 0. ftp_fopen_wrapper.c 639
static size_t php_ftp_dirstream_read(....) { size_t tmp_len; .... tmp_len--; while (tmp_len >= 0 &&
The 'size_t' type, being unsigned, allows indexing the maximum number of array elements for the current bitness of the application. Check (tmp_len> = 0) is incorrect. In the worst case, the decrement can lead to index overflow and memory access outside the array. Most likely, the code is executed correctly due to additional conditions and correct source data, but there is a danger of "looping" or going beyond the array.
Unsigned Difference
V555 The expression 'out_buf_size - ocnt> 0' will work as 'out_buf_size! = Ocnt'. filters.c 1702
static int strfilter_convert_append_bucket( { size_t out_buf_size; .... size_t ocnt, icnt, tcnt; .... if (out_buf_size - ocnt > 0) {
Perhaps the 'else' branch will perform less frequently than it should, because the difference between unsigned numbers will almost always be greater than zero. The exception will be the equality of the operands: in this case, the condition is better to rewrite to a more informative version.
Pointer dereference
V595 The 'function_name' pointer was used before it was verified against nullptr. Check lines: 4859, 4860. basic_functions.c 4859
static int user_shutdown_function_call(zval *zv TSRMLS_DC) { .... php_error(E_WARNING, "....", function_name->val);
Checking the pointer after dereference always looks suspicious. In case of a real error, it can lead to the crash of the program.
Similar place:
- V595 The 'callback_name' pointer was used before it was verified against nullptr. Check lines: 5007, 5021. basic_functions.c 5007
Insidious optimization
V597 The compiler couldn’t delete the memset function call, which is used to flush the final buffer. The RtlSecureZeroMemory () function should be used to erase the private data. php_crypt_r.c 421
char* php_md5_crypt_r(const char *pw,const char *salt, char *out) { static char passwd[MD5_HASH_MAX_LEN], *p; unsigned char final[16]; .... memset(final, 0, sizeof(final));
The array 'final' may contain private information about the password, which is then reset, but the call to the function 'memset' will be deleted by the compiler. Details of why this can happen and why it is dangerous can be found in the article “
Overwrite Memory - Why? ” And diagnostic descriptions for
V597 .
Similar places:
- V597 The compiler couldn’t delete the memset function call, which is used to flush the final buffer. The RtlSecureZeroMemory () function should be used to erase the private data. php_crypt_r.c 421
- V597 The compiler couldn’t delete the memset function call, which is used to flush 'output' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt.c 214
- V597 The compiler could delete the memset function call, which is used to flush the temp_result buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha512.c 622
- V597 The compiler couldn’t delete the memset function call, which is used to flush the ctx object. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha512.c 625
- V597 The compiler could delete the memset function call, which is used to flush the alt_ctx object. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha512.c 626
- V597 The compiler could delete the memset function call, which is used to flush the temp_result buffer. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha256.c 574
- V597 The compiler couldn’t delete the memset function call, which is used to flush the ctx object. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha256.c 577
- V597 The compiler could delete the memset function call, which is used to flush the alt_ctx object. The RtlSecureZeroMemory () function should be used to erase the private data. crypt_sha256.c 578
Can we trust the libraries used?
It is worth noting that third-party libraries make a great contribution to the development of the project, allowing the re-use of already implemented algorithms, but their quality also needs to be monitored, as well as the main project. I will give just a few examples from some libraries in order to follow the topic of the article and simply raise the issue of trust in third-party libraries.
The PHP interpreter uses many libraries, some of which are slightly rewritten for themselves.
libsqlite
V579 The sqlite3_result_blob It is possibly a mistake. Inspect the third argument. sqlite3.c 82631
static void statInit(....) { Stat4Accum *p; .... sqlite3_result_blob(context, p, sizeof(p), stat4Destructor); .... }
Most likely, they wanted to get the size of the object, not the pointer. It should have written sizeof (* p).
pcrelib
V501 There are identical sub-expressions '(1 << ucp_gbL)' operator. pcre_tables.c 161
const pcre_uint32 PRIV(ucp_gbtable[]) = { (1<<ucp_gbLF), 0, 0, .... (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|
In the expression for computing one element of the array there is a repeating one (1 << ucp_gbL). Judging by the code below, one of the variables ucp_gbL could well be called ucp_gbT, or simply superfluous.
PDO
V595 The 'dbh' pointer was used before it was verified against nullptr. Check lines: 103, 110. pdo_dbh.c 103
PDO_API void pdo_handle_error(pdo_dbh_t *dbh, ....) { pdo_error_type *pdo_err = &dbh->error_code;
Here, at the very beginning of the function, the dereference of the incoming pointer is performed, and then it is checked for validity.
libmagic
V519 The '* code' variable is assigned twice twice successively. Perhaps this is a mistake. Check lines: 100, 101. encoding.c 101
protected int file_encoding(...., const char **code, ....) { if (looks_ascii(buf, nbytes, *ubuf, ulen)) { .... } else if (looks_utf8_with_BOM(buf, nbytes, *ubuf, ulen) > 0) { DPRINTF(("utf8/bom %" SIZE_T_FORMAT "u\n", *ulen)); *code = "UTF-8 Unicode (with BOM)"; *code_mime = "utf-8"; } else if (file_looks_utf8(buf, nbytes, *ubuf, ulen) > 1) { DPRINTF(("utf8 %" SIZE_T_FORMAT "u\n", *ulen)); *code = "UTF-8 Unicode (with BOM)";
Two times set the encoding to a variable, one line is superfluous and, possibly, leads to the wrong behavior of the program further.
Conclusion
Despite the fact that PHP has existed for a long time and is popular, there are still suspicious places in its code and used libraries, although this kind of project is probably checked by various analyzers.
Using static analysis regularly, you can save a lot of time on solving more useful tasks.
This article is in English.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov.
A Post About Analyzing PHP .