
According to the results of the survey in the
first article , it was decided to review the implementation of the expansion. At this point, the syntax has slightly changed for the existing IDE, which, perhaps, was the most negotiable point.
This is not another article-o-hello-world expansion, because Those who wish to understand the basics can easily find a lot of materials both
on Habré itself and in the
Russian-language RTFG .
An article about the prerequisites, implementation and pitfalls. It will be a little PHP, mainly C.
Prerequisites
If it is not interesting to read tl; dr, then you can go straight to
Realization .
I'll start from afarI like Python, especially some of its syntax. And since the last time I write mostly in PHP, I want the work tool to be more convenient and functional. PHP is developing well in recent versions, but there are still no such decorators. So you have to take everything into your own hands. First there was the idea of ​​unification in terms of names and passing the parameters of core functions (
another extension of mine , which is still at the stage of idea formation. If necessary, I can write about its creation ), now here are the decorators and some other things.
Function decorators (the methods are not mentioned here and hereafter, since they are no different for decoration) allow you to change the work of the latter, wrapping their calls with an additional layer of logic. In a declarative form, a list of wrappers is described, which in the end should return something that will replace the original function. In the python syntax, we get the
following :
@decomaker(argA, argB, ...) def func(arg1, arg2, ...): # ... # : func = decomaker(argA, argB, ...)(func)
There is no such possibility in PHP. At first, I decided to take this syntax as it is and transfer it unchanged (except for passing the parameters of the decorator when calling; see below). The first article describes exactly this syntax. However, the IDE with syntax checking and one of two camps of commentators made us think. As a result, the syntax is made more portable. Now the decorator's description should be described in a single-line comment #:
- # is partially deprecated, so its use will be more noticeable than ordinary //, / * and / **;
- A single-line comment does not collapse and is harder to lose sight of;
Suddenly decorators in php 5.x still appear and the need for this extension with a strange syntax will disappear.
Having decided on the description format, you need to decide how the decorators themselves will be implemented. The decorator function must return a function that replaces the original decorating. This is where the anonymous functions and closures come in:
some PHP code <?php function decor($func) { echo "Decorator!\n"; return function () use($func) { return call_user_func_array($func, func_get_args()); }; } function a($v) { var_dump($v); } $b = decor('a'); $b(42);
In PHP, a mediated function call with passing parameters to it is, of course, verbose, this cannot be taken away.
The result is a syntax from the image at the beginning of the article:
<?php function decor($func) { return function(){} }
And I want to get it without
rewriting the Zend lexer , so that PHP itself does not have to be rebuilt (it
works - do not touch it ).
')
Implementation
There are two options for doing this:
- Change the source code before it gets to the PHP tutor;
- Change the finished opcode, but then the syntax should be compatible with the existing one.
The second option looked dubious on the issue of compatibility with all sorts of opcode caches and optimizers. And the initial version of the decorators syntax (without the # comment) in this case would not work.
The first option was chosen.
Zend has two sources of source code “income”:
In both cases, we have pointers to functions with a specific implementation. Standard implementations can be found by looking at the initialization of pointers in
zend_startup :
Both functions accept input in one form or another and source the array of opcode as
_zend_op_array . Unfortunately, despite the similarity of the tasks performed, their implementation is different. So we will influence both.
The effect on similar function pointers in Zend and PHP extensions has been put on stream. For example, the same zend_compile_file is replaced in
ZendAccelerator and
phar . This is not counting third-party extensions.
For substitution, you only need to implement your analogue, and replace the pointer, retaining the original. Everything as usual.
It turns out about the following PHP_MINIT_FUNCTION(decorators); PHP_MSHUTDOWN_FUNCTION(decorators); zend_module_entry decorators_module_entry = { // ... decorators_functions, PHP_MINIT(decorators), PHP_MSHUTDOWN(decorators), // ... }; zend_op_array *(*decorators_orig_zend_compile_string)(zval *source_string, char *filename TSRMLS_DC); zend_op_array *(*decorators_orig_zend_compile_file)(zend_file_handle *file_handle, int type TSRMLS_DC); zend_op_array* decorators_zend_compile_string(zval *source_string, char *filename TSRMLS_DC); zend_op_array* decorators_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC); /* {{{ PHP_MINIT_FUNCTION */ PHP_MINIT_FUNCTION(decorators) { decorators_orig_zend_compile_string = zend_compile_string; zend_compile_string = decorators_zend_compile_string; decorators_orig_zend_compile_file = zend_compile_file; zend_compile_file = decorators_zend_compile_file; return SUCCESS; } /* }}} */ /* {{{ PHP_MSHUTDOWN_FUNCTION */ PHP_MSHUTDOWN_FUNCTION(decorators) { zend_compile_string = decorators_orig_zend_compile_string; zend_compile_file = decorators_orig_zend_compile_file; return SUCCESS; } /* }}} */ zend_op_array* decorators_zend_compile_string(zval *source_string, char *filename TSRMLS_DC) /* {{{ */ { return decorators_orig_zend_compile_string(source_string, filename TSRMLS_CC); } /* }}} */ zend_op_array* decorators_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC) /* {{{ */ { return decorators_orig_zend_compile_file(file_handle, type TSRMLS_CC); } /* }}} */
During the initialization of the module (our extension), the pointers were replaced, and upon completion of the work, they did not forget to return everything. Directly in the substituted functions we call the original implementation.
Not everything can be done during module initialization, but in our case this is quite enough.
And if everything is more or less clear with the compile_string (the input string comes to the input), then with compile_file everything is not so rosy - we don’t have the source code, only the source description in
zend_file_handle . And in different cases, different sets of fields are used.
Direct reading of the source is buried pretty far. ZEND_API zend_op_array *compile_file(zend_file_handle *file_handle, int type TSRMLS_DC) { // ... open_file_for_scanning(file_handle TSRMLS_CC) // ... } ZEND_API int open_file_for_scanning(zend_file_handle *file_handle TSRMLS_DC) { // ... zend_stream_fixup(file_handle, &buf, &size TSRMLS_CC) // ... }
And the most interesting thing for us here is
zend_stream_fixup , a function that unifies all sources of input of the source code and outputs the read buffer and its size. That seems to be what we need, but we cannot influence the work of zend_stream_fixup and open_file_for_scanning, we only have control over compile_file.
Someone went to copy-paste to himself these functions and all their dependencies, but we will make it easier. If you look at the source zend_stream_fixup, then you can see that all types are reduced to a single ZEND_HANDLE_MAPPED, which has the source code and its length in file_handle-> handle.stream.mmap.buf and file_handle-> handle.stream.mmap.len . Moreover, if this data type is already specified in file_handle, then almost nothing needs to be changed and everything is given as is.
It turns out that if we send zend_file_handle * file_handle in compile_file () already in the format ZEND_HANDLE_MAPPED with the correct value of all the fields, compile_file will accept this as it was. And we can do this by calling zend_stream_fixup (which is a function of the Zend API, and not a replaceable pointer) once more before the compile_file call. Then re-calling inside open_file_for_scanning just won't change anything.
We try zend_op_array* decorators_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC) { char *buf; size_t size; if (zend_stream_fixup(file_handle, &buf, &size TSRMLS_CC) == FAILURE) { return NULL; } // file_handle ZEND_HANDLE_MAPPED return decorators_orig_zend_compile_file(file_handle, type TSRMLS_CC); }
Hooray, it works. Moreover, we have the source file in file_handle-> handle.stream.mmap.buf / len, from where PHP would take it: stdin, fd, include http stream ... It remains to put our modified version of the code there and call the original zend_compile_file.
How decorators_preprocessor () does not work: I’ll get an obvious string, pass it to the preprocessor, and return the result string. Below, and so will the pieces of code from this function.It remains to consider the preprocessor itself.
Transfer of separate raw data to a single function void preprocessor(zval *source_zv, zval *return_value TSRMLS_DC) { // source_zv return_value } /* {{{ DECORS_CALL_PREPROCESS */ #define DECORS_CALL_PREPROCESS(result_zv, buf, len) \ do { \ zval *source_zv; \ ALLOC_INIT_ZVAL(result_zv); \ ALLOC_INIT_ZVAL(source_zv); \ ZVAL_STRINGL(source_zv, (buf), (len), 1); \ preprocessor(source_zv, result_zv TSRMLS_CC); \ zval_dtor(source_zv); \ FREE_ZVAL(source_zv); \ } while (0); \ /* }}} */ /* {{{ proto string decorators_preprocessor(string $code) */ PHP_FUNCTION(decorators_preprocessor) { char *source; int source_len; zval *result; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &source, &source_len) == FAILURE) { return; } DECORS_CALL_PREPROCESS(result, source, source_len); // ... } /* }}} */ zend_op_array* decorators_zend_compile_string(zval *source_string, char *filename TSRMLS_DC) /* {{{ */ { zval *result; DECORS_CALL_PREPROCESS(result, Z_STRVAL_P(source_string), Z_STRLEN_P(source_string)); // ... } /* }}} */ zend_op_array* decorators_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC) /* {{{ */ { // ... zval *result; DECORS_CALL_PREPROCESS(result, file_handle->handle.stream.mmap.buf, file_handle->handle.stream.mmap.len); // ... } /* }}} */
Find and redo!
The task of the preprocessor is to find descriptions of decorators and to modify the code of the functions that decorators influence. And for this it is best to work with source text tokens. In order not to reinvent the wheel, the native Zend lexical scanner
lex_scan was used, an example of the use of which for its own purposes can be seen in the implementation of
token_get_all and
tokenize , called inside token_get_all.
- Save the current environment of the scanner in which our code works:
zend_lex_state original_lex_state;
zend_save_lexical_state (& original_lex_state TSRMLS_CC);
- Prepare the source line for parsing:
zend_prepare_string_for_scanning (& source_z, "" TSRMLS_CC)
- Set the initial state of the lexer (all options here ):
LANG_SCNG (yy_state) = yycST_IN_SCRIPTING;
In contrast to token_get_all, we already parse the PHP code, so the presence of the opening tag is not necessary for us. Appropriately, the initial state is not yycINITIAL, but yycST_IN_SCRIPTING.
- In the loop, we get all the tokens of the source line:
zval token_zv;
int token_type;
while (token_type = lex_scan (& token_zv TSRMLS_CC)) {
// ...
}
token_type - token type:
- <256 is the character code of a single-character token;
- > = 256 - the value of the constant T_ * . The string description by token_type can be obtained via PHP_FUNCTION (token_name) / get_token_type_name.
token_zv contains the lexeme value itself. However, as an alternative, you can use the yy_text and yy_leng fields of the zend_lex_state structure, which store the address of the first byte of the current token and its length, respectively. Access to these fields, like many things in Zend, is implemented through the appropriate macros:
#define zendtext LANG_SCNG (yy_text)
#define zendleng LANG_SCNG (yy_leng)
Now we use char * zendtext and unsigned int zendleng.
In order to avoid memory leak, you need to take into account that the token_zv value is sometimes taken as it is from the source buffer, and sometimes memory is allocated for it. Which needs to be released. Those who are interested can look at the lex_scan () code, but for now just take the necessary piece of logic from token_get_all.
- We restore the environment of the scanner in which our code works:
zend_restore_lexical_state (& original_lex_state TSRMLS_CC);
Everything, we have a lexical analysis of the source code. But I would like to highlight some more points of parsing.
If PHP parses errors, the handler generates an error or exception, the file name and line number in the text of which are taken from the
_zend_compiler_globals state. The file name, for example, is taken from the compiled_filename field. Which is set when calling zend_prepare_string_for_scanning (). It is used inside
zend_error (used to generate any E_ * errors; it is also used in this extension to generate E_PARSE). But compiled_filename in zend_error () is used only if Zend is in the compile state (zend_bool in_compilation; everything is in the same _zend_compiler_globals). Which in itself is not activated if we parse the source.
So before parsing we switch to “compiling”:
zend_bool original_in_compilation = CG (in_compilation);
CG (in_compilation) = 1;
And at the end we return everything back:
CG (in_compilation) = original_in_compilation;
Now, if we pass the correct filename to zend_prepare_string_for_scanning, the possible errors will be much more informative. You can get the current file name via zend_get_compiled_filename (), which, however, can return NULL, from which php (if NULL is passed to zend_prepare_string_for_scanning) falls into segfault.
It remains to set the correct file name in decorators_preprocessor and decorators_zend_compile_file PHP_FUNCTION(decorators_preprocessor) { // ... char *prev_filename = zend_get_compiled_filename(TSRMLS_CC) ? zend_get_compiled_filename(TSRMLS_CC) : ""; zend_set_compiled_filename("-" TSRMLS_CC); DECORS_CALL_PREPROCESS(result, source, source_len); zend_set_compiled_filename(prev_filename TSRMLS_CC); // ... } zend_op_array* decorators_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC) /* {{{ */ { // ... char *prev_filename = zend_get_compiled_filename(TSRMLS_CC) ? zend_get_compiled_filename(TSRMLS_CC) : ""; const char* filename = (file_handle->opened_path) ? file_handle->opened_path : file_handle->filename; zend_set_compiled_filename(filename TSRMLS_CC); zval *result; DECORS_CALL_PREPROCESS(result, file_handle->handle.stream.mmap.buf, file_handle->handle.stream.mmap.len); zend_set_compiled_filename(prev_filename TSRMLS_CC); // ... }
In decorators_zend_compile_string, the file name is already known.
Source code modification
Having received everything that is needed for preprocessing, it remains to actually produce it. The task of translating text composed of pieces (tokens) into the final text might not be so simple in C due to the active work with stitching together lines. However, in
/PHP/ext/standard/php_smart_str.h there is an implementation of smart strings, which will be very useful to us.
Short smart_str str = {0}; smart_str str2 = {0}; smart_str_appendc(&str, '!'); smart_str_appendl(&str, "hello", 5); smart_str_append(&str, &str2); smart_str_append_long(&str, 42);
In the loop parsing of lexemes, we glue the resulting string from tokens (zendtext, zendleng), where you need to change / add from yourself. Directly replacement algorithm decorators, IMHO, is not so interesting. From the potentially interesting - check that the T_COMMENT type token is similar to the decorator's description: the regular check '^ # [\ t] * @' (simple cycle, without regexp) is being checked and the address '@' is returned.
Little PHP last
When processing decorators, the source code of the function being decorated changes slightly: the body of the function is wrapped in an anonymous function, which is passed by the parameter to the nearest decorator. Those. for code
As a result of preprocessing, the following code will be obtained:
By A, C, D, X is meant an arbitrary code that is copied as is.
From this the following consequences follow:
- If the function being decorated is declared with parameters that have a default value, then the anonymous function will also have everything:
function foo ($ a, $ b = 42, $ c = array (100, 500))
{...
=>
function foo ($ a, $ b = 42, $ c = array (100, 500))
{return call_user_func_array (... (function ($ a, $ b = 42, $ c = array (100, 500)) {...
- In case of errors in the description of the names of the decorators, the line with the code in the description of the error will be the line with the function body '{'. Similarly, with the parameters of the decorators - they will be on the line with the body closing the function '}';
- Decorator parameters can be given the variable names of the function being decorated. It turns out read / write control over the parameters;
- Since func_get_args () is used to pass parameters, then passing a parameter by reference to the function being decorated is currently not working.
Well that's all. If you really read up here, I hope it was interesting.
I will provide and in this article the link to
github .