zval
(Zend value) implementations have changed since the fifth version of PHP. Also discuss the implementation of links. In the second part, the implementation of individual data types, such as strings and objects, will be discussed in detail. typedef struct _zval_struct { zvalue_value value; zend_uint refcount__gc; zend_uchar type; zend_uchar is_ref__gc; } zval;
value
, type
and additional information __gc
, which I will discuss below. Value
is the union of the various possible values that zval can store: typedef union _zvalue_value { long lval; // , double dval; // struct { // char *val; int len; } str; HashTable *ht; // zend_object_value obj; // zend_ast *ast; // } zvalue_value;
lval
, then its value will be interpreted as a signed integer. The dval
value will be represented as a double-precision floating-point number. And so on. #define IS_NULL 0 /* */ #define IS_LONG 1 /* lval */ #define IS_DOUBLE 2 /* dval */ #define IS_BOOL 3 /* lval 0 1 */ #define IS_ARRAY 4 /* ht */ #define IS_OBJECT 5 /* obj */ #define IS_STRING 6 /* str */ #define IS_RESOURCE 7 /* lval resource ID */ /* , */ #define IS_CONSTANT 8 #define IS_CONSTANT_AST 9
zval
are being used and which ones need to be cleaned. For this, reference counting is used. Component refcount__gc
just stores information about how many times referred to zval
. For example, in $a = $b = 42
the value 42 refers to two variables, so refcount is 2. If the value of refcount is zero, this means that the value is not used and can be cleared.zval
together until it changes. To modify the shared zval
it must be duplicated (separated) and all operations should be carried out with a copy.zval'
: $a = 42; // $a -> zval_1(type=IS_LONG, value=42, refcount=1) $b = $a; // $a, $b -> zval_1(type=IS_LONG, value=42, refcount=2) $c = $b; // $a, $b, $c -> zval_1(type=IS_LONG, value=42, refcount=3) // zval $a += 1; // $b, $c -> zval_1(type=IS_LONG, value=42, refcount=2) // $a -> zval_2(type=IS_LONG, value=43, refcount=1) unset($b); // $c -> zval_1(type=IS_LONG, value=42, refcount=1) // $a -> zval_2(type=IS_LONG, value=43, refcount=1) unset($c); // zval_1 , refcount=0 // $a -> zval_2(type=IS_LONG, value=43, refcount=1)
zval
becomes part of the loop, it is written to the root buffer. When this buffer is full, potential cycles are marked and cleaned by the garbage collector. typedef struct _zval_gc_info { zval z; union { gc_root_buffer *buffered; struct _zval_gc_info *next; } u; } zval_gc_info;
zval
and an additional pointer. The u
pointer, which is a union, is used to denote one of two types. The buffered
pointer stores information about where the zval
referenced in the root buffer
. In the case of zval
destruction, the pointer is destroyed until the cyclic collector is started (which is very convenient), next
used when the collector deletes values.zvalue_value
union is 16 bytes, since str
and obj
are the same size. The entire zval
structure is 24 bytes, and zval_gc_info
is 32 bytes. Among other things, placing the zval
on the heap consumes an additional 16 bytes. Total for each zval
accounts for 48 bytes, regardless of the number of places where it is used.zval
. Judge for yourself: let's say it stores a simple integer, which in itself takes 8 bytes. Also, in any case, you need to store and type label, which occupies one byte, but because of the structure requires all eight. To the resulting 16 bytes, you need to add another 16 for the needs of reference counting and the cyclic garbage collector, and another 16 for placement on the heap. Not to mention that the operations of allocation and subsequent deletion consume a lot of resources.zval
in PHP 5:Zval
(almost) is always required to be placed on the heap.Zval
always require the use of reference counting and gathering information about cycles. Even in cases where the sharing of values is not worth the resources spent (integer) or cycles can not occur in principle.zval
's. For example, a string cannot be shared in a zval
and hash table key (without storing this key, also in the form of a zval).zval
. One of the major innovations is that zval
no longer needs to be placed separately on the heap. Also, the refcount is now stored not in the zval
itself, but in any of the complex values it points to - in strings, arrays or objects. This gives the following benefits:zval
and be a key in a hash table.zval
: struct _zval_struct { zend_value value; union { struct { ZEND_ENDIAN_LOHI_4( zend_uchar type, zend_uchar type_flags, zend_uchar const_flags, zend_uchar reserved) } v; uint32_t type_info; } u1; union { uint32_t var_flags; uint32_t next; // hash collision chain uint32_t cache_slot; // literal cache slot uint32_t lineno; // line number (for ast nodes) uint32_t num_args; // arguments number for EX(This) uint32_t fe_pos; // foreach position uint32_t fe_iter_idx; // foreach iterator index } u2; };
value
. The second component is an integer one that stores information about the type, which is divided into separate bytes using a merge (you can ignore the macro ZEND_ENDIAN_LOHI_4
, it is needed only to provide a consistent structure between platforms with different byte order). The important parts of this nested construct are type
and type_flags
, which I will discuss below.Value
takes 8 bytes, and due to its structure, adding even one byte will entail an increase in the size of zval
by 16 bytes. But we don’t need as many as 8 bytes to store the type. Therefore, in zval
there is an additional u2
join, which is not used by default, but can be used to store 4 bytes of data. Different components of the union are designed for different uses of this additional storage.value
union is slightly different from the fifth version: typedef union _zend_value { zend_long lval; double dval; zend_refcounted *counted; zend_string *str; zend_array *arr; zend_object *obj; zend_resource *res; zend_reference *ref; zend_ast_ref *ast; // , zval *zv; void *ptr; zend_class_entry *ce; zend_function *func; struct { ZEND_ENDIAN_LOHI( uint32_t w1, uint32_t w2) } ww; } zend_value;
value
now occupies 8 bytes instead of 16. It stores only integer ( lval
) and floating-point numbers ( dval
). Everything else is a pointer. All pointer types (with the exception of the special ones noted above) use reference counting and contain a header defined by zend_refcounted: struct _zend_refcounted { uint32_t refcount; union { struct { ZEND_ENDIAN_LOHI_3( zend_uchar type, zend_uchar flags, uint16_t gc_info) } v; uint32_t type_info; } u; };
type
, flags
and gc_info
. Type only inherits the type of zval
and allows the GC to distinguish between different counting structures without storage in zval
. Flags
used for different tasks with different data types. I will tell about it in more detail in the second part.Gc_info
similar to buffered
in the old version of zval
. But instead of storing the pointer to the root buffer
it now stores the index. Since the root buffer
has a limited capacity (10,000 items), it suffices to use a 16-bit pointer instead of a 64-bit one. Also, gc_info
contains information about the “color” of the node used to refer to the nodes in the collections.zval
no longer needs to be placed separately on the heap. But they need to be stored somewhere. They are still part of the heap structures. For example, a hash table will contain its own zval
instead of a pointer to a separate zval
. The compiled function variable table and the object property table will be zval
arrays. As such, zval
now usually stores those whose indirection is one level lower. That is, zval
'is now called what used to be zval
*.zval
* and increment its refcount in order to use zval
in a new location. Now all you have to do is copy the contents of zval
(ignoring u2
) and, perhaps , increment the refcount of the value it points to if the value uses reference counting.type_info
component is type_info
. #define IS_TYPE_CONSTANT (1<<0) /* */ #define IS_TYPE_IMMUTABLE (1<<1) /* */ #define IS_TYPE_REFCOUNTED (1<<2) #define IS_TYPE_COLLECTABLE (1<<3) #define IS_TYPE_COPYABLE (1<<4) #define IS_TYPE_SYMBOLTABLE (1<<5) /* */
refcounted
, collectable
and copyable
.Collectable
means that zval
can be part of a loop. For example, string variables are often refcounted
, but it’s impossible to create a loop with them.opyable
determines whether a value should be copied when duplication is performed. If you duplicate a zval
pointing to an array, this does not mean that the refcount value of the array will only increase. Instead, a new independent copy of the array will be created. But in the case of some types, for example, objects and resources, with duplication, the refcount only increases. Such types are called non-copyable. This corresponds to the transfer of semantics of objects and resources (which are not passed by reference). | refcounted | collectable | copyable | immutable -----------------------+------------+-------------+----------+---------- | | | | | x | | x | | | | | | x | x | x | | | | | x | x | x | | | x | | | | x | | |
zval
control works in practice. First we take a construction with integer values: $a = 42; // $a = zval_1(type=IS_LONG, value=42) $b = $a; // $a = zval_1(type=IS_LONG, value=42) // $b = zval_2(type=IS_LONG, value=42) $a += 1; // $a = zval_1(type=IS_LONG, value=43) // $b = zval_2(type=IS_LONG, value=42) unset($a); // $a = zval_1(type=IS_UNDEF) // $b = zval_2(type=IS_LONG, value=42)
zval
. I remind you that they are now embedded, and not placed in memory separately. This is underlined by using = instead of ->. When clearing a variable, the type of the corresponding zval
will change to IS_UNDEF
. $a = []; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[]) $b = $a; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=2, value=[]) // $b = zval_2(type=IS_ARRAY) ---^ // zval $a[] = 1 // $a = zval_1(type=IS_ARRAY) -> zend_array_2(refcount=1, value=[1]) // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[]) unset($a); // $a = zval_1(type=IS_UNDEF) zend_array_2 // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])
zval
, but both pointers refer to the same (counted) zend_array
structure. After the change is complete, you need to duplicate the array. In PHP 5, in a similar situation, everything works the same way. // #define IS_UNDEF 0 #define IS_NULL 1 #define IS_FALSE 2 #define IS_TRUE 3 #define IS_LONG 4 #define IS_DOUBLE 5 #define IS_STRING 6 #define IS_ARRAY 7 #define IS_OBJECT 8 #define IS_RESOURCE 9 #define IS_REFERENCE 10 // #define IS_CONSTANT 11 #define IS_CONSTANT_AST 12 // #define IS_INDIRECT 15 #define IS_PTR 17
IS_UNDEF
used instead of the pointer to zval
NULL
(do not confuse with IS_NULL zval
). For example, in the examples above, variables are assigned the type IS_UNDEF
.IS_BOOL
type IS_BOOL
divided into IS_FALSE
and IS_TRUE
. Since this boolean value is now built into the type, this allows you to optimize a number of checks based on the type. This change is unnoticeable for users who still operate with a single “boolean” type.is_ref
flag in zval . A new type IS_REFERENCE
introduced IS_REFERENCE
. Below I will tell how it works.IS_INDIRECT
and IS_PTR
are special internal types.IS_LONG
instead of the usual long
from the C language now uses the value zend_long
. The reason is that in 64-bit Windows, the length is only 32 bits. Therefore, PHP 5 no longer uses 32-bit numbers on Windows. And in PHP 7, you can use 64-bit values if the system is also 64-bit.zend_refcounted
individual types. Here we confine ourselves to parsing the implementation of PHP links.zval
needs to be duplicated before making changes. This is done in order not to accidentally change the value for each place using zval
, which corresponds to the semantics of passing by value.is_ref
flag allows is_ref
to determine if a value is a PHP reference, and if so, whether a separation is required before making changes. $a = []; // $a -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[]) $b =& $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[]) $b[] = 1; // $a = $b = zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[1]) // is_ref=1, PHP zval
$a = []; // $a -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[]) $b = $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) $c = $b // $a, $b, $c -> zval_1(type=IS_ARRAY, refcount=3, is_ref=0) -> HashTable_1(value=[]) $d =& $c; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[]) // $d $c, $a $b, zval . zval is_ref=0 is_ref=1. $d[] = 1; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[]) // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[1]) // zval $d[] = 1 $a $b.
$array = range(0, 1000000); $ref =& $array; var_dump(count($array)); // <--
count()
takes a value directly from a variable, but $array
is a PHP reference, so a complete copy of the array is created before it is passed to count()
. If $array
not a reference, the value would be shared.zval
no longer allocated separately, there is no way to use the approach from PHP 5. There is a new type IS_REFERENCE
that uses the zend_reference
structure as a value: struct _zend_reference { zend_refcounted gc; zval val; };
zend_reference
is a zval
with reference counting. In all variables of the reference set, the zval
will be stored with the IS_REFERENCE
type pointing to the same zend_reference
instance. The behavior of val
is no different from any other zval
, including in terms of the possibility of sharing the complex value it points to.zval
variables. $a = []; // $a -> zend_array_1(refcount=1, value=[]) $b =& $a; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[]) $b[] = 1; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[1])
zend_reference
was created by assigning by reference. Notice that the refcount reference is 2 (because two variables are part of the set of PHP references), but the refcount value itself is 1, because it is referenced by one structure zend_reference
. Now consider the situation when using links and non-links: $a = []; // $a -> zend_array_1(refcount=1, value=[]) $b = $a; // $a, $b, -> zend_array_1(refcount=2, value=[]) $c = $b // $a, $b, $c -> zend_array_1(refcount=3, value=[]) $d =& $c; // $a, $b -> zend_array_1(refcount=3, value=[]) // $c, $d -> zend_reference_1(refcount=2) ---^ // , PHP-, , zend_array. $d[] = 1; // $a, $b -> zend_array_1(refcount=2, value=[]) // $c, $d -> zend_reference_1(refcount=2) -> zend_array_2(refcount=1, value=[1]) // zend_array, .
count()
, . , zend_reference
.zval
, refcount. , — , . , .Source: https://habr.com/ru/post/257999/
All Articles