📜 ⬆️ ⬇️

Internal representation of values ​​in PHP7 (part 1)

Due to the large amount of material, the publication had to be divided into two parts. In the first one, I’ll talk about how zval (Zend value) implementations have changed since the fifth version of PHP. Also discuss the implementation of links. In the second part, the implementation of individual data types, such as strings and objects, will be discussed in detail.

zval's in PHP 5


The zval structure in the fifth version looks like this:

 typedef struct _zval_struct { zvalue_value value; zend_uint refcount__gc; zend_uchar type; zend_uchar is_ref__gc; } zval; 

As you can see, the construction includes value , type and additional information __gc , which I will discuss below. Value is the union of the various possible values ​​that zval can store:

 typedef union _zvalue_value { long lval; //  ,    double dval; //      struct { //   char *val; int len; } str; HashTable *ht; //   zend_object_value obj; //   zend_ast *ast; //   } zvalue_value; 

C language aggregation is a structure in which only one component can be active at a time, and the size of which is equal to the size of the largest component. All the components of the union are stored in memory in one place and can be interpreted differently, depending on which one of them you are addressing. If we assume lval , then its value will be interpreted as a signed integer. The dval value will be represented as a double-precision floating-point number. And so on.
')
To find out which component of the union is currently being used, you can view the current value of the type property:

 #define IS_NULL 0 /*    */ #define IS_LONG 1 /*  lval */ #define IS_DOUBLE 2 /*  dval */ #define IS_BOOL 3 /*  lval   0  1 */ #define IS_ARRAY 4 /*  ht */ #define IS_OBJECT 5 /*  obj */ #define IS_STRING 6 /*  str */ #define IS_RESOURCE 7 /*  lval   resource ID */ /*  ,      */ #define IS_CONSTANT 8 #define IS_CONSTANT_AST 9 


Link Counting in PHP 5


With a few exceptions, zval in PHP 5 is located in a heap. Therefore, PHP needs to somehow be monitored: which zval are being used and which ones need to be cleaned. For this, reference counting is used. Component refcount__gc just stores information about how many times referred to zval . For example, in $a = $b = 42 the value 42 refers to two variables, so refcount is 2. If the value of refcount is zero, this means that the value is not used and can be cleared.

Note that the links that refcount counts (how many times a value is currently used) have nothing to do with PHP links (using &). To avoid confusion, hereinafter we will use the terms "links" and "PHP-links." We do not consider the last yet.

Similar to the counting of links, the idea underlies the “copy-on-write”. You can only use zval together until it changes. To modify the shared zval it must be duplicated (separated) and all operations should be carried out with a copy.

This example shows copying on write and killing zval' :

 $a = 42; // $a -> zval_1(type=IS_LONG, value=42, refcount=1) $b = $a; // $a, $b -> zval_1(type=IS_LONG, value=42, refcount=2) $c = $b; // $a, $b, $c -> zval_1(type=IS_LONG, value=42, refcount=3) //    zval $a += 1; // $b, $c -> zval_1(type=IS_LONG, value=42, refcount=2) // $a -> zval_2(type=IS_LONG, value=43, refcount=1) unset($b); // $c -> zval_1(type=IS_LONG, value=42, refcount=1) // $a -> zval_2(type=IS_LONG, value=43, refcount=1) unset($c); // zval_1 ,   refcount=0 // $a -> zval_2(type=IS_LONG, value=43, refcount=1) 

Link counting has one serious flaw: this mechanism is not capable of defining circular references. For this, PHP uses an additional tool - a circular garbage collector . Every time the value of refcount decreases and there is a chance that zval becomes part of the loop, it is written to the root buffer. When this buffer is full, potential cycles are marked and cleaned by the garbage collector.

The following structure is used to ensure the work of this cyclic collector:

 typedef struct _zval_gc_info { zval z; union { gc_root_buffer *buffered; struct _zval_gc_info *next; } u; } zval_gc_info; 


The zval_gc_info structure includes a regular zval and an additional pointer. The u pointer, which is a union, is used to denote one of two types. The buffered pointer stores information about where the zval referenced in the root buffer . In the case of zval destruction, the pointer is destroyed until the cyclic collector is started (which is very convenient), next used when the collector deletes values.

The need for change


Let's talk a little about the size (all of the following applies to 64-bit systems). The zvalue_value union is 16 bytes, since str and obj are the same size. The entire zval structure is 24 bytes, and zval_gc_info is 32 bytes. Among other things, placing the zval on the heap consumes an additional 16 bytes. Total for each zval accounts for 48 bytes, regardless of the number of places where it is used.

And here doubts creep in the effectiveness of the implementation of zval . Judge for yourself: let's say it stores a simple integer, which in itself takes 8 bytes. Also, in any case, you need to store and type label, which occupies one byte, but because of the structure requires all eight. To the resulting 16 bytes, you need to add another 16 for the needs of reference counting and the cyclic garbage collector, and another 16 for placement on the heap. Not to mention that the operations of allocation and subsequent deletion consume a lot of resources.

It is appropriate to ask the question: does storing simple integers really require reference counting, the use of a cyclic collector, and placement on a heap? Of course not. Here is a list of the main problems associated with the implementation of zval in PHP 5:

Zval's in PHP 7


In the seventh version of the language, we got a new implementation of zval . One of the major innovations is that zval no longer needs to be placed separately on the heap. Also, the refcount is now stored not in the zval itself, but in any of the complex values ​​it points to - in strings, arrays or objects. This gives the following benefits:

Here is the structure of the new zval :

 struct _zval_struct { zend_value value; union { struct { ZEND_ENDIAN_LOHI_4( zend_uchar type, zend_uchar type_flags, zend_uchar const_flags, zend_uchar reserved) } v; uint32_t type_info; } u1; union { uint32_t var_flags; uint32_t next; // hash collision chain uint32_t cache_slot; // literal cache slot uint32_t lineno; // line number (for ast nodes) uint32_t num_args; // arguments number for EX(This) uint32_t fe_pos; // foreach position uint32_t fe_iter_idx; // foreach iterator index } u2; }; 


The first component has remained almost the same, this is the union value . The second component is an integer one that stores information about the type, which is divided into separate bytes using a merge (you can ignore the macro ZEND_ENDIAN_LOHI_4 , it is needed only to provide a consistent structure between platforms with different byte order). The important parts of this nested construct are type and type_flags , which I will discuss below.

There is also one small problem here. Value takes 8 bytes, and due to its structure, adding even one byte will entail an increase in the size of zval by 16 bytes. But we don’t need as many as 8 bytes to store the type. Therefore, in zval there is an additional u2 join, which is not used by default, but can be used to store 4 bytes of data. Different components of the union are designed for different uses of this additional storage.

In PHP 7, the value union is slightly different from the fifth version:

 typedef union _zend_value { zend_long lval; double dval; zend_refcounted *counted; zend_string *str; zend_array *arr; zend_object *obj; zend_resource *res; zend_reference *ref; zend_ast_ref *ast; //    ,   zval *zv; void *ptr; zend_class_entry *ce; zend_function *func; struct { ZEND_ENDIAN_LOHI( uint32_t w1, uint32_t w2) } ww; } zend_value; 


Note that value now occupies 8 bytes instead of 16. It stores only integer ( lval ) and floating-point numbers ( dval ). Everything else is a pointer. All pointer types (with the exception of the special ones noted above) use reference counting and contain a header defined by zend_refcounted:

 struct _zend_refcounted { uint32_t refcount; union { struct { ZEND_ENDIAN_LOHI_3( zend_uchar type, zend_uchar flags, uint16_t gc_info) } v; uint32_t type_info; } u; }; 


Of course, there is also a refcount in this structure. In addition, there are type , flags and gc_info . Type only inherits the type of zval and allows the GC to distinguish between different counting structures without storage in zval . Flags used for different tasks with different data types. I will tell about it in more detail in the second part.

Gc_info similar to buffered in the old version of zval . But instead of storing the pointer to the root buffer it now stores the index. Since the root buffer has a limited capacity (10,000 items), it suffices to use a 16-bit pointer instead of a 64-bit one. Also, gc_info contains information about the “color” of the node used to refer to the nodes in the collections.

Zval memory management


I already mentioned that zval no longer needs to be placed separately on the heap. But they need to be stored somewhere. They are still part of the heap structures. For example, a hash table will contain its own zval instead of a pointer to a separate zval . The compiled function variable table and the object property table will be zval arrays. As such, zval now usually stores those whose indirection is one level lower. That is, zval 'is now called what used to be zval *.

Once it was necessary to copy zval * and increment its refcount in order to use zval in a new location. Now all you have to do is copy the contents of zval (ignoring u2 ) and, perhaps , increment the refcount of the value it points to if the value uses reference counting.

How does PHP know that counting is used? This cannot be determined by type alone, since some types do not use refcount — for example, strings and arrays. For this, one bit of the type_info component is type_info .

Several bits are also used to encode type properties:

 #define IS_TYPE_CONSTANT (1<<0) /*  */ #define IS_TYPE_IMMUTABLE (1<<1) /*  */ #define IS_TYPE_REFCOUNTED (1<<2) #define IS_TYPE_COLLECTABLE (1<<3) #define IS_TYPE_COPYABLE (1<<4) #define IS_TYPE_SYMBOLTABLE (1<<5) /*  */ 


There are three basic properties that a type can have: refcounted , collectable and copyable .

Collectable means that zval can be part of a loop. For example, string variables are often refcounted , but it’s impossible to create a loop with them.

opyable determines whether a value should be copied when duplication is performed. If you duplicate a zval pointing to an array, this does not mean that the refcount value of the array will only increase. Instead, a new independent copy of the array will be created. But in the case of some types, for example, objects and resources, with duplication, the refcount only increases. Such types are called non-copyable. This corresponds to the transfer of semantics of objects and resources (which are not passed by reference).

Below is a table that shows which flags can use certain types. By "simple" we mean types like integer or boolean, which do not use a pointer to an external structure. In the second part, I will also examine the unchanging arrays in more detail.

  | refcounted | collectable | copyable | immutable -----------------------+------------+-------------+----------+----------   | | | |  | x | | x |   | | | |  | x | x | x |   | | | | x  | x | x | |  | x | | |  | x | | | 

Let's look at two examples of how the zval control works in practice. First we take a construction with integer values:

 $a = 42; // $a = zval_1(type=IS_LONG, value=42) $b = $a; // $a = zval_1(type=IS_LONG, value=42) // $b = zval_2(type=IS_LONG, value=42) $a += 1; // $a = zval_1(type=IS_LONG, value=43) // $b = zval_2(type=IS_LONG, value=42) unset($a); // $a = zval_1(type=IS_UNDEF) // $b = zval_2(type=IS_LONG, value=42) 


Since integer values ​​are no longer shared, both variables use different zval . I remind you that they are now embedded, and not placed in memory separately. This is underlined by using = instead of ->. When clearing a variable, the type of the corresponding zval will change to IS_UNDEF .

Now for the second example, here a complex value is already used:

 $a = []; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[]) $b = $a; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=2, value=[]) // $b = zval_2(type=IS_ARRAY) ---^ //    zval $a[] = 1 // $a = zval_1(type=IS_ARRAY) -> zend_array_2(refcount=1, value=[1]) // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[]) unset($a); // $a = zval_1(type=IS_UNDEF)  zend_array_2  // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[]) 

Each variable still has a separate (built-in) zval , but both pointers refer to the same (counted) zend_array structure. After the change is complete, you need to duplicate the array. In PHP 5, in a similar situation, everything works the same way.

Types


What types are supported in PHP 7:

 //    #define IS_UNDEF 0 #define IS_NULL 1 #define IS_FALSE 2 #define IS_TRUE 3 #define IS_LONG 4 #define IS_DOUBLE 5 #define IS_STRING 6 #define IS_ARRAY 7 #define IS_OBJECT 8 #define IS_RESOURCE 9 #define IS_REFERENCE 10 //   #define IS_CONSTANT 11 #define IS_CONSTANT_AST 12 //   #define IS_INDIRECT 15 #define IS_PTR 17 


What are the differences from PHP 5:

Type IS_LONG instead of the usual long from the C language now uses the value zend_long . The reason is that in 64-bit Windows, the length is only 32 bits. Therefore, PHP 5 no longer uses 32-bit numbers on Windows. And in PHP 7, you can use 64-bit values ​​if the system is also 64-bit.

In the next part, we take a closer look at the implementation of the zend_refcounted individual types. Here we confine ourselves to parsing the implementation of PHP links.

Links


In PHP 7, the approach to using & PHP links has changed dramatically. And this was one of the main reasons for the appearance of bugs. First, let's remember how this is implemented in PHP 5. In the normal situation, the principle of “copy on write” implies that zval needs to be duplicated before making changes. This is done in order not to accidentally change the value for each place using zval , which corresponds to the semantics of passing by value.

For php links this is no good. If the value is a PHP link, then you will want to change it for each user. In PHP 5, the is_ref flag allows is_ref to determine if a value is a PHP reference, and if so, whether a separation is required before making changes.

 $a = []; // $a -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[]) $b =& $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[]) $b[] = 1; // $a = $b = zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[1]) //  is_ref=1, PHP    zval 


One significant problem is connected with this approach: it is impossible to share a value between two variables, one of which is a PHP link, and the other is not.

 $a = []; // $a -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[]) $b = $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) $c = $b // $a, $b, $c -> zval_1(type=IS_ARRAY, refcount=3, is_ref=0) -> HashTable_1(value=[]) $d =& $c; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[]) // $d   $c,   $a  $b,  zval    .      zval  is_ref=0    is_ref=1. $d[] = 1; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[1]) //      zval $d[] = 1   $a  $b. 


This behavior leads to the fact that when using links, performance is lower than when using normal values. Here is a less intricate example illustrating this problem:

 $array = range(0, 1000000); $ref =& $array; var_dump(count($array)); // <--    


count() takes a value directly from a variable, but $array is a PHP reference, so a complete copy of the array is created before it is passed to count() . If $array not a reference, the value would be shared.

Now let's see how PHP links are implemented in the seventh version. Since zval no longer allocated separately, there is no way to use the approach from PHP 5. There is a new type IS_REFERENCE that uses the zend_reference structure as a value:

 struct _zend_reference { zend_refcounted gc; zval val; }; 


Essentially, zend_reference is a zval with reference counting. In all variables of the reference set, the zval will be stored with the IS_REFERENCE type pointing to the same zend_reference instance. The behavior of val is no different from any other zval , including in terms of the possibility of sharing the complex value it points to.

In the examples above, consider the semantics of PHP 7. For brevity, we take only the structure referenced by the individual zval variables.

 $a = []; // $a -> zend_array_1(refcount=1, value=[]) $b =& $a; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[]) $b[] = 1; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[1]) 


A new one zend_referencewas created by assigning by reference. Notice that the refcount reference is 2 (because two variables are part of the set of PHP references), but the refcount value itself is 1, because it is referenced by one structure zend_reference. Now consider the situation when using links and non-links:

 $a = []; // $a -> zend_array_1(refcount=1, value=[]) $b = $a; // $a, $b, -> zend_array_1(refcount=2, value=[]) $c = $b // $a, $b, $c -> zend_array_1(refcount=3, value=[]) $d =& $c; // $a, $b -> zend_array_1(refcount=3, value=[]) // $c, $d -> zend_reference_1(refcount=2) ---^ //  ,   PHP-,    ,   zend_array. $d[] = 1; // $a, $b -> zend_array_1(refcount=2, value=[]) // $c, $d -> zend_reference_1(refcount=2) -> zend_array_2(refcount=1, value=[1]) //       zend_array,   . 


, , , PHP- . . PHP 7 count() , . , zend_reference .

Conclusion


: PHP 7 , zval , refcount. , — , . , .

Source: https://habr.com/ru/post/257999/


All Articles