Why would a simple PHP developer need a source debug? Well, for example, if he noticed some not obvious behavior and wants to understand him at the most “low” level. I would like to talk about such an interesting behavior for me, as well as the process of “debag surasov”.
Motivation
Let's start with the bat, here are two similar pieces of code:
Time$arr = []; for ($i =0; $i < 300; $i++) { $arr[rand(1,1000)] = 1; } $sum = 0; for ($i = 1001; $i < 200000000; $i++){ if (array_key_exists($i, $arr)){ $sum++; } }
Two $arr = []; for ($i =0; $i < 300; $i++) { $arr[rand(1,1000)] = 1; } sort($arr); $sum = 0; for ($i = 1001; $i < 200000000; $i++){ if (array_key_exists($i, $arr)){ $sum++; } }
The difference between them is that in the second case we sorted the $ arr array, thereby updating the keys 0..count ($ arr) -1. And I was interested in the moment that the first script runs 6.0 seconds, while the second is 4.7 seconds. It turns out about 20 percent difference.
')
If you are familiar with the internal structure of the php hash table, as well as with the hash function, then you either already know the answer or can easily guess. Well, for the rest, I'll tell you how to set up the environment for debugging and find out what's the matter.
Customize the environment
We will need:
- source php.
- debager gdb.
- php itself compiled in debug mode.
We clone php source files to ourselves:
git clone https://git.php.net/repository/php-src.git cd php-src git checkout -b PHP-7.1 origin/PHP-7.1
If you do not have the libraries necessary for compiling, put them too:
sudo apt-get install build-essential autoconf automake libtool
And compile:
./buildconf ./configure --disable-all --enable-debug –prefix=$HOME/dev/bin/php make && make install
We start the search
To begin with, we will determine the starting point of our investigation, this will be the function
array_key_exists . Find it in the source of labor will not be if you know how to declare functions in PHP - this is the macro
PHP_FUNCTION (% function_name%) . Do a search for the source code of the string
PHP_FUNCTION (array_key_exists) and find our function in
extension / standart / array.c . Code:
PHP_FUNCTION (array_key_exists) PHP_FUNCTION(array_key_exists) { zval *key; HashTable *array; ZEND_PARSE_PARAMETERS_START(2, 2) Z_PARAM_ZVAL(key) Z_PARAM_ARRAY_OR_OBJECT_HT(array) ZEND_PARSE_PARAMETERS_END(); switch (Z_TYPE_P(key)) { case IS_STRING: if (zend_symtable_exists_ind(array, Z_STR_P(key))) { RETURN_TRUE; } RETURN_FALSE; case IS_LONG: if (zend_hash_index_exists(array, Z_LVAL_P(key))) { RETURN_TRUE; } RETURN_FALSE; case IS_NULL: if (zend_hash_exists_ind(array, ZSTR_EMPTY_ALLOC())) { RETURN_TRUE; } RETURN_FALSE; default: php_error_docref(NULL, E_WARNING, "The first argument should be either a string or an integer"); RETURN_FALSE; } }
Before us is a high-level wrapper for core functions of a language, dig a little deeper into the function
zend_hash_index_exists .
We can find it in
Zend / zend_hash.c :
ZEND_API zend_bool ZEND_FASTCALL zend_hash_index_exists(const HashTable *ht, zend_ulong h) { Bucket *p; IS_CONSISTENT(ht); if (ht->u.flags & HASH_FLAG_PACKED) { if (h < ht->nNumUsed) { if (Z_TYPE(ht->arData[h].val) != IS_UNDEF) { return 1; } } return 0; } p = zend_hash_index_find_bucket(ht, h); return p ? 1 : 0; }
Debashim
We start debager:
gdb --args /home/your_pc/dev/bin/php/bin/php /home/your_pc/array_w_sort_test.php
1st parameter is the path to compiled php, 2nd is the path to our test in which we use sorting.
After we hit the debager, it’s worth running the following command:
(gdb) source ~/dev/c/php-src/.gdbinit
This will give us some very useful commands, you can see the list by writing:
(gdb) help user-defined
Put a breakpoint on the function
zend_hash_index_exists :
(gdb) break zend_hash_index_exists
And run the script:
(gdb) run
We see on the screen something like:
(gdb) run Starting program: /home/your_pc/dev/bin/php/bin/php /home/your_pc/array_w_sort_test.php Breakpoint 1, zend_hash_index_exists (ht=0x7ffff7059780, h=1001) at /home/your_pc/dev/c/php-src/Zend/zend_hash.c:2030 2030 IS_CONSISTENT(ht);
Great, we're inside the thread of execution!
To see where we are in the PHP code, run the command:
(gdb) zbacktrace
The result tells us that the entry point was defined correctly:
[0x7ffff7015240] array_key_exists(1001, array(251)[0x7ffff70152a0]) [internal function] [0x7ffff7015030] (main) /home/your_pc/array_w_sort_test.php:14
To navigate through the source code use the command:
(gdb) next
or
(gdb) step
The first will not fall into the functions that occur along the way, the second, respectively, will be.
Run the command to view the parameters with which the function is called:
(gdb) info args ht = 0x7ffff7059780 h = 1001
ht is a pointer to the hash table, h is the desired key.
And so, we get to the line:
if (ht->u.flags & HASH_FLAG_PACKED) {
We do next - the condition is satisfied! Ok, let's do the same steps for the test without sorting - the condition is not met! Well, we found the “plug”, which makes such a difference in speed. We will understand in more detail.
A little about the structure of the PHP hash tables.
Run the
ptype command to see what the hash table is ht
(gdb) ptype ht type = const struct _zend_array { zend_refcounted_h gc; union { struct {...} v; uint32_t flags; } u; uint32_t nTableMask; Bucket *arData; uint32_t nNumUsed; uint32_t nNumOfElements; uint32_t nTableSize; uint32_t nInternalPointer; zend_long nNextFreeElement; dtor_func_t pDestructor; } *
Here we are interested in:
- arData - C array containing buckets. A bucket is a C structure containing a zval - a stored item, a key in a php array, and also a hash of this key as a number.
- NnumUsed - the maximum occupied index in the arData + 1 array.
- u.flags - no signed int, each bit of which corresponds to a flag.
- In addition, carefully looking at the structure, you ask how the key hash is mapped to the arData array? For this, php developers use the translation table . It is stored “behind” the arData array ( arData [-1] , arData [-2] , etc.) and is a map of the hashes to the indices of the arData array.
- Finally, you need to know that the hash operation does not apply to integer keys, and they are translated into the hash keys of the table “as is”.
Back to code
ht → u.flags set of flags is collapsed into a number. For the
HASH_FLAG_PACKED flag
, we are interested in the 3rd bit.
To get the value of the flags, run the command:
(gdb) print ht->u.flags $1 = 30
So it is, in the 3rd bit unit.
Since the condition is true we get here:
if (h < ht->nNumUsed) { if (Z_TYPE(ht->arData[h].val) != IS_UNDEF) { return 1; } } return 0;
In line 1, we check whether the value of the required key is valid with respect to the
arData array. If yes, we just have to check whether the
arData array at address
h contains an element.
Well, if the flag
HASH_FLAG_PACKED is not set, then we will get here.
p = zend_hash_index_find_bucket(ht, h);
This function will search for the required element, but using the translation table mentioned above.
Well, that's all the magic. Debagger can be closed. There is only 1 question left. And what is this actually for the
HASH_FLAG_PACKED flag and where is it set?
This flag signals us about
Packed hashtable optimization - all the keys of the php array are numbers and are sorted in ascending order. Regarding the test example, it is not difficult to guess, this flag is set inside the sort function. As it turned out, this optimization affects a fairly large number of aspects of working with the language. There probably is no other article you can write.
And my story ends here, we looked at some of the techniques of working in a bunch of php + dbg, and out of the corner of our eyes took a look at one interesting optimization in hash table algorithms. I hope the methods described above will help you to research the wonderful PHP tool.