Good day, dear Harbocommunity!
Under the cat there is some information about writing extensions for PHP using C ++, which I obtained from various sources (mostly English-speaking) and picking the source code of Zend Engine 2 during the development of one module for my own needs. Since its volume is large enough, then I tried to be brief.
So, in this part:
But we will not get to C ++ in this part ... =)
')
Little disclaimer: the content of the article is not the truth in the first instance, it is not based on official documentation (is there any?) And is my subjective view on ZE 2. Nevertheless, in due time I would be happy to find something similar on the Runet in order to save time in the initial stages of development.
Inner world of Zend Engine 2
Basic data types
Zend Engine 2 is written in C. This has greatly influenced its internal ecosystem. In the absence of a class-object paradigm within ZE 2, global variables, free functions, and similarities of user-defined data types — structures — sprang up. For all occasions, there are combinations of simple and composite data types and procedures for processing them.
The most common structure is
zval (zend-value?). The structure is a representation of the PHP variable on the opposite side from
userspace (by userspace, hereinafter, we understand code written in PHP, which ZE does, and the reverse side of
userspace is C code).
PHP is a language with weak dynamic typing and automatic memory management, variables in this language can change their type during their life cycle and do not require explicit removal by the programmer after the need for them disappears (the garbage collector will take care of this independently). For these whims, in part, you have to take the rap and zval. Currently (
PHP 5.3.3 ), this structure is defined as follows (zend.h):
typedef struct _zval_struct zval; struct _zval_struct { zvalue_value value; zend_uint refcount__gc; zend_uchar type; zend_uchar is_ref__gc; };
What do we see here? Not surprisingly, the zend value (zval) is not directly the value of a variable. The value of the variable is stored in the value field, the type of which is
zvalue_value (below). The type of the value stored in value is determined by the type field and can be one of the following (zend.h):
#define IS_NULL 0 #define IS_LONG 1 #define IS_DOUBLE 2 #define IS_BOOL 3 #define IS_ARRAY 4 #define IS_OBJECT 5 #define IS_STRING 6 #define IS_RESOURCE 7 #define IS_CONSTANT 8 #define IS_CONSTANT_ARRAY 9
Yeah, here they are - the very 8 types of PHP data that people so often cannot enumerate during interviews! Plus two separately standing values ​​IS_CONSTANT and IS_CONSTANT_ARRAY.
The structure also contains information on the number of references to this variable (refcount__gc) and a flag that determines whether this variable is a reference (is_ref__gc). This information is needed to organize efficient work with memory inside the ZE. For example, the following situation in userspace:
<?php $foo = 5;
will create a zval val object of type IS_LONG (all integers in userspace correspond to C long on the other side of userspace), setting its is_ref_gc to 0 and refcount_gc to 1 (first line) and registering the symbol “foo” (about this in the following series) in the current symbol table of the process (actually just an associative array of variable names and their values) with val as the value. On the second line, a new instance of zval will not be created. In the symbol table that has just been placed in the symbol table, the reference counter will be increased by 1, and the symbol “bar” will be registered in the symbol table with the same val ʻom. Due to this, the number of required memory allocations for creating new variables will be reduced.
When the interpreter will meet the code:
... $bar = '42';
it will extract the corresponding symbol “bar” from the current symbol table of the zval process, and, first of all, check the number of references to this value. If it is greater than 1 and zval is not a link (is_ref_gc == 0), the interpreter will create a copy of the current $ bar value, perform actions on it (in our case, equate to the string value '42') and place the “bar” characters with new value. In case, if refcoun_gc == 1 or is_ref_gc == 1, the actions will be performed directly on the value obtained from the symbol table. Thus, in the following (rather artificial, but having the right to life) situation:
<?php $foo = 100500;
an interpreter will cost only one zval, but two characters corresponding to it. This is possible, because on the line with comment 1 a zval val will be created with the number of links equal to 1. On the line with comment 2, a new character “bar” will be registered with the value val, which will have 2 in the refcoun_gc, but on the line with comment 3, a new zval will not be created, since after calling unset ($ bar) the number of references to val will again be reduced to 1.
As it is not difficult to guess, is_ref_gc becomes equal to 1, when constructs of the form $ b = & $ a are encountered in userspace.
The approach described above can be called “separation upon modification” (separate on write).
Now look at the type
zvalue_value (zend.h):
typedef union _zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value;
It is not difficult to notice that this is a
union . This means that the data stored in the memory at the address of a variable of type zvalue_value can be interpreted by the programmer in a convenient way both at the design stage and at the execution stage as any of the data types included in the combination. It is this feature of zvalue_value that allows PHP userspace variables to change their type so easily throughout their lives (recall that the current type of the zval variable can be found out by referring to its type field).
However, you can say that you see here only 5 unique fields, and data types are 8. The mapping of PHP types to the zvalue_value union is as follows:
- long lval - integer (integer), boolean type (boolean) and resources (resource)
- double dval - here is unique (double)
- struct ... str - PHP strings (string)
- HashTable * ht - arrays (array)
- zend_object_value obj - PHP objects (object) - both SPL, and user.
- null - the value has no value (zval.type == IS_NULL).
So what can be useful to take from consideration of the zvalue_value structure?
- The resource type is represented only by an integer identifier, and therefore in userspace it looks like a dark horse. On the other side of userspace, such an identifier can be associated with an open file descriptor, TCP socket, database connection object, etc. The mapping takes place through a special resource repository (on the C side, of course).
- All real in PHP have double precision and occupy 8 bytes.
- Strings in principle can be binary safe, i.e. use null characters inside. Reached due to the fact that the length of the string is stored together with a pointer to the memory where this line is located. The strlen operation is fast in userspace. The null character at the end of the line is optional. In fact, many extensions use exactly null-terminated strings and do not squeamish srishnoy strlen.
- Arrays are represented by the internal type HashTable. HashTable is another structure, but its consideration and principles of working with it are beyond the scope of this article.
- Objects in PHP are represented by the zend_object_value structure. We will talk about it further, since the development of our extension is connected with the creation of our own data type.
- Creating a variable in PHP will spend at least 16 bytes on a 32-bit architecture, no matter what type the variable is (add up the size of the zval fields, taking into account that the size of the union is equal to the size of the maximum field in its composition).
So we got to the objects. The zend_object_value structure is intended to represent userspace variables containing objects. And what is an object in a class-object paradigm? An object is a symbiosis of data and methods for their processing. Now let's look at the structure
zend_object_value (zend_type.h):
typedef unsigned int zend_object_handle; typedef struct _zend_object_handlers zend_object_handlers; typedef struct _zend_object_value { zend_object_handle handle; zend_object_handlers *handlers; } zend_object_value;
The structure is the union of some integer identifier (handle) and another structure (zend_object_handlers * handlers), which contains pointers to functions that will be called by the ZE 2 engine upon the occurrence of certain events related to the object. Such events include: equating a variable with a variable containing an object (zend_object_add_ref_t add_ref), going beyond the scope, initializing with another value or calling unset for a variable containing an object (zend_object_del_ref_t del_ref), cloning an object with a call to __clone (zend_object_t_ clone_obj), referring to the property of the object (zend_object_read_property_t read_property), writing the property of the object (zend_object_write_property_t write_property), etc. The
zend_object_handlers structure
itself looks like this (zend_object_handlers.h):
struct _zend_object_handlers { zend_object_add_ref_t add_ref; zend_object_del_ref_t del_ref; zend_object_clone_obj_t clone_obj; zend_object_read_property_t read_property; zend_object_write_property_t write_property; zend_object_read_dimension_t read_dimension; zend_object_write_dimension_t write_dimension; zend_object_get_property_ptr_ptr_t get_property_ptr_ptr; zend_object_get_t get; zend_object_set_t set; zend_object_has_property_t has_property; zend_object_unset_property_t unset_property; zend_object_has_dimension_t has_dimension; zend_object_unset_dimension_t unset_dimension; zend_object_get_properties_t get_properties; zend_object_get_method_t get_method; zend_object_call_method_t call_method; zend_object_get_constructor_t get_constructor; zend_object_get_class_entry_t get_class_entry; zend_object_get_class_name_t get_class_name; zend_object_compare_t compare_objects; zend_object_cast_t cast_object; zend_object_count_elements_t count_elements; zend_object_get_debug_info_t get_debug_info; zend_object_get_closure_t get_closure; };
and read about it in detail
here .
Distracted, back to zend_object_value. So, what does it contain besides pointers to event handler functions? And nothing! If we saw some kind of attempt to determine the behavior of an object in _zend_object_handlers, then apart from some strange identifier (handle), no data specific to a particular instance is observed in it. But the identifier itself (spherical in vacuum) has no meaning. So there must be some kind of repository of homogeneous entities, where this identifier will distinguish one entity from another.
Such storage in ZE is
Zend Object Storage . The keys to it are object descriptors (zend_object_value.handle), and the values ​​... Yes, yes, you probably already guessed - another type of structure is
zend_object (zend.h):
typedef struct _zend_object { zend_class_entry *ce; HashTable *properties; HashTable *guards; } zend_object;
Here, an
oil painting is already beginning to form from scattered pieces. HashTable * properties - this is where you can keep data specific to a particular instance. The properties field is a standard associative array for ZE, the keys of which should be the names of the fields of the class of the current object, and the values ​​the current values ​​of the fields (properties) of this object.
So, at the moment we have the following possibilities for working with objects - we can redefine the standard behavior of an object in certain situations (by redefining the corresponding functions in zend_object_handlers) and can store data in instance fields, writing them to HashTable * properties related to current object. Something is missing ... Oh, yes, but how to add custom behavior to the object (create new methods)? Since methods are something common for all objects of one class, it would be logical to place them in some structure, shared access to which would be for all objects of a class. This structure is
zend_class_entry (zend.h):
struct _zend_class_entry { char type; char *name; zend_uint name_length; struct _zend_class_entry *parent; int refcount; zend_bool constants_updated; zend_uint ce_flags; HashTable function_table; HashTable default_properties; HashTable properties_info; HashTable default_static_members; HashTable *static_members; HashTable constants_table; const struct _zend_function_entry *builtin_functions;
The purpose of the zend_class_entry structure is to represent the general aspects of all objects of the same class. zend_class_entry is actually the class itself. The structure, as you see, is not small and it is not the task of this article to consider the purpose of each of its fields. Let's stop our attention in the fields that I have marked with comments.
const struct
_zend_function_entry * builtin_functions - pointer to an array of _zend_function_entry structures. It is not difficult to guess that these are the methods of our future class. Since it is marked with the const modifier, changing the elements of this array (ie, overriding methods of an instantiated class object) is not possible (unlike zend_object_value.handlers).
Fields starting from the constructor by unserialize_func are
PHP 's
magic methods (it's not hard to guess that unserialize_func is __wakeup, and serialize_func is __sleep, the other methods have similar mnemonics).
In the process of creating your own extension, you can either add entries to builtin_functions or override the magic methods of the future class.
The last, but not the least, hero of this excursion into the fascinating world of ZE structures will be the structure that deals with the representation of the expansion module itself -
zend_module_entry (zend_modules.h):
struct _zend_module_entry { unsigned short size; unsigned int zend_api; unsigned char zend_debug; unsigned char zts; const struct _zend_ini_entry *ini_entry; const struct _zend_module_dep *deps; const char *name; const struct _zend_function_entry *functions; int (*module_startup_func)(INIT_FUNC_ARGS); int (*module_shutdown_func)(SHUTDOWN_FUNC_ARGS); int (*request_startup_func)(INIT_FUNC_ARGS); int (*request_shutdown_func)(SHUTDOWN_FUNC_ARGS); void (*info_func)(ZEND_MODULE_INFO_FUNC_ARGS); const char *version; size_t globals_size; #ifdef ZTS ts_rsrc_id* globals_id_ptr; #else void* globals_ptr; #endif void (*globals_ctor)(void *global TSRMLS_DC); void (*globals_dtor)(void *global TSRMLS_DC); int (*post_deactivate_func)(void); int module_started; unsigned char type; void *handle; int module_number; char *build_id; };
Since, in the general case, an extension can implement not one, but several classes at once, or vice versa, export functions alone (the OOP has no wedges, as they say), consider how this affected the design of the above structure:
- const struct _zend_function_entry * functions - a pointer to an array of functions exported by the extension. Similar to zend_class_entry.builtin_functions.
- int (* module_startup_func) (INIT_FUNC_ARGS) - pointer to the function that will be called when the extension is connected. In particular, if an extension exports classes, it is in this function that classes must be registered in the internal register of classes ZE 2.
- int (* module_shutdown_func) (SHUTDOWN_FUNC_ARGS) is a pointer to a function that is called when the extension is unloaded. Here we have to wipe ourselves.
Data type hierarchy
In the previous paragraph, we looked at the path from the ordinary zval to the general zend_module_entry. Skillfully operating with these types, you can create your own PHP extension and organize a coherent factory for the production of objects. In fact, the PHP plug-in is similar to the userpace training school. First you need to build it (call PHP_MINIT_FUNCTION) and register it at the labor exchange (by declaring exported classes or functions in PHP_MINIT_FUNCTION) as a recruitment agency with a certain focus, and then, upon the first request for an employee (new instance of the class), start the fighter training cycle (creating object). Preparation consists in allocating memory for the created object, associating it with specific event handlers (zend_object_handlers) and own class (zend_class_entry), which contains the methods of the future object, and registering the object with Zend Object Storage and then assigning it a unique identifier. Such preparation is usually placed in the function extension_objects_new and is associated with the field zend_class_entry.create_object.
Schematically, the structure of the extension can be represented as follows:

And in order to more clearly visualize the data type hierarchy in the extension, I’ll give the following schema:

Conclusion
The article got a lot of text and very little code, but without an excursion into the world of data types ZE 2, trying to understand the purpose of the calls of certain functions would be quite difficult. In the next part I will give an explanation of the first steps that need to be done to create your own PHP expansion module, but before that I’ll touch on the topic of working with zvals and memory allocation management.
PS An excellent selection of articles for those interested is
here .