Error
exception will be thrown. I.e: 1 |> 3; // [1, 2, 3] 2.5 |> 5; // [2.5, 3.5, 4.5] $a = $b = 1; $a |> $b; // [1] 2 |> 1; // Error exception 1 |> '1'; // Error exception new StdClass |> 1; // Error exception
T_RANGE
, |> is returned. For this you have to update the file Zend / zend_language_scanner.l . Add the following code to it (in the section where all tokens are declared, approximately the 1200th line): <ST_IN_SCRIPTING>"|>" { RETURN_TOKEN(T_RANGE); }
ST_IN_SCRIPTING
mode. This means that it will only define a sequence of |> characters. Between braces is a code on C, which will be executed when it detects |> in the source code. In this example, the T_RANGE
token is T_RANGE
.Retreat. If we modify the lexical analyzer, then for its regeneration we need Re2c. For normal PHP builds, this dependency is not needed.
T_RANGE
identifier must be declared in the Zend / zend_language_parser.y file . To do this, add to the end of the section where the remaining token identifiers are declared (approximately line 220): %token T_RANGE "|> (T_RANGE)"
1 |> 2; // Parse error: syntax error, unexpected '|>' (T_RANGE) in...
token_get_all
and token_name
. At the moment he is in happy ignorance regarding the T_RANGE
token: echo token_name(token_get_all('<?php 1|>2;')[2][0]); // UNKNOWN
echo token_name(token_get_all('<?php 1|>2;')[2][0]); // T_RANGE
T_RANGE
token is used in the PHP scripts. Also the parser is responsible for:Retreat . Priority sets the rules for grouping expressions. For example, in the expression 3 + 4 * 2, the * character has a higher priority than +, therefore the expression will be grouped as 3 + (4 * 2).
Associativity describes the behavior of an operator during chain building: whether the operator can be embedded in the chain, and if so, how it will be grouped within a specific expression. Suppose a ternary operator has left-sided associativity, then it will be grouped and executed from left to right. That is the expression1 ? 0 : 1 ? 0 : 1; // 1
will be executed as(1 ? 0 : 1) ? 0 : 1; // 1
If we correct this and prescribe right-sided associativity, the expression will be executed as follows:$a = 1 ? 0 : (1 ? 0 : 1); // 0
There are non-associative operators that cannot be embedded in chains at all. Let's say the> operator. So this expression will be erroneous:1 < $a < 2;
T_SPACESHIP
). This is done by adding the token T_RANGE
to the end of the next line (approximately 70th): %nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE
expr_without_variable
. Add the following code to it (for example, right before the rule T_SPACESHIP
, approximately the 930th line): | expr T_RANGE expr { $$ = zend_ast_create(ZEND_AST_RANGE, $1, $3); }
zend_ast_create
function zend_ast_create
used to create our AST node for a new operator. The node name is ZEND_AST_RANGE
, it contains two values: $ 1 refers to the left operand ( expr T_RANGE expr), $ 3 refers to the right operand (expr T_RANGE expr ).ZEND_AST_RANGE
. To do this, update the Zend / zend_ast.h file by simply adding a constant under the list of two child nodes (for example, under ZEND_AST_COALESCE
): ZEND_AST_RANGE,
1 |> 2;
ZEND_AST_RANGE
) to the large branch operator in the zend_compile_expr
function (for example, immediately after ZEND_AST_COALESCE
, roughly the ZEND_AST_COALESCE
line): case ZEND_AST_RANGE: zend_compile_range(result, ast); return;
zend_compile_range
: void zend_compile_range(znode *result, zend_ast *ast) /* {{{ */ { zend_ast *left_ast = ast->child[0]; zend_ast *right_ast = ast->child[1]; znode left_node, right_node; zend_compile_expr(&left_node, left_ast); zend_compile_expr(&right_node, right_ast); zend_emit_op_tmp(result, ZEND_RANGE, &left_node, &right_node); } /* }}} */
ZEND_AST_RANGE
node into the left_ast
and right_ast
pointer right_ast
. Next, we declare two znode variables in which the result of compiling the AST nodes of each of the two operands will be stored. This is the recursive part of processing the tree and compiling its nodes into opcodes.zend_emit_op_tmp
function, zend_emit_op_tmp
generate the ZEND_RANGE
with its two operands.zend_emit_op_tmp
function.Retreat . Opcodes for PHP scripts can be found using:
- PHPDBG:
sapi/phpdbg/phpdbg -np* program.php
- Opcache
- Vulcan Logic Disassembler (VLD) Extensions:
sapi/cli/php -dvld.active=1 program.php
- If the script is short and simple, then you can use 3v4l
znode_op
nodes ( znode_op
structures) can be of different types:IS_CV
( C ompiled V ariables). These are simple variables (like $ a), cached in simple arrays to bypass searches in a hash table. They appeared in PHP 5.1 as the optimization of compiled variables (Compiled Variables). In VLD, they are denoted by! N (n is an integer).IS_VAR
. For all complex expressions that play the role of variables (like $ a-> b). May contain zval IS_REFERENCE
, in VLD are denoted by $ n (n is integer).IS_CONST
. For literal values (for example, explicitly spelled strings).IS_TMP_VAR
. Temporary variables are used to store the intermediate result of an expression (and therefore not for long). They can participate in reference counting (refcount) (in PHP 7), but cannot contain zval IS_REFERENCE
, because temporary variables cannot be used as references. In VLD, denoted by ~ n (n is integer).IS_UNUSED
. Usually used to designate an op node as unused. But sometimes znode_op.num
can store data for use by the virtual machine.zend_emit_op_tmp
, which will generate a zend_op
type IS_TMP_VAR
. We need this because our operator will be an expression, and the value (array) produced by it will be a temporary variable that can be used as an operand for another opcode (for example, ASSIGN
from $var = 1 |> 3;
). ZEND_VM_HANDLER(182, ZEND_RANGE, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV) { USE_OPLINE zend_free_op free_op1, free_op2; zval *op1, *op2, *result, tmp; SAVE_OPLINE(); op1 = GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R); op2 = GET_OP2_ZVAL_PTR_DEREF(BP_VAR_R); result = EX_VAR(opline->result.var); // if both operands are integers if (Z_TYPE_P(op1) == IS_LONG && Z_TYPE_P(op2) == IS_LONG) { // for when min and max are integers } else if ( // if both operands are either integers or doubles (Z_TYPE_P(op1) == IS_LONG || Z_TYPE_P(op1) == IS_DOUBLE) && (Z_TYPE_P(op2) == IS_LONG || Z_TYPE_P(op2) == IS_DOUBLE) ) { // for when min and max are either integers or floats } else { // for when min and max are neither integers nor floats } FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); }
ZEND_VM_LAST_OPCODE
at the end.Retreat . The above code contains several pseudo-USE_OPLINE
(USE_OPLINE
andGET_OP1_ZVAL_PTR_DEREF
). These are not real C-macros during the generation of the virtual machine, they are replaced by the Zend / zend_vm_gen.php script , unlike the procedure performed by the preprocessor during the compilation of the source code. So if you want to see their definitions, then refer to the Zend / zend_vm_gen.php file .
ZEND_VM_HANDLER
contains the definition of each opcode. It can have five parameters:ZEND_RANGE
).$vm_op_decode
in Zend / zend_vm_gen.php ).$vm_op_decode
in Zend / zend_vm_gen.php ).$vm_ext_decode
at Zend / zend_vm_gen.php ). // CONST enables for 1 |> 5.0; // TMP enables for (2**2) |> (1 + 3); // VAR enables for $cmplx->var |> $var[1]; // CV enables for $a |> $b;
Retreat . If one or both operands are not used, they are marked with ANY.
Retreat .TMPVAR
appeared in ZE 3. It handles the same types of opcode nodes asTMP|VAR
, but generates different code.TMPVAR
generates one method for processingTMP
andVAR
, which reduces the size of the virtual machine, but requires more conditional logic. AndTMP|VAR
generates separate methods for processingTMP
andVAR
, which increases the size of the virtual machine, but requires less conditional structures.
USE_OPLINE
pseudo- USE_OPLINE
to declare the variable opline (zend_op structure). It will be used to read operands (using pseudo- GET_OP1_ZVAL_PTR_DEREF
like GET_OP1_ZVAL_PTR_DEREF
) and prescribe the return value of the opcode.zend_free_op
. These are simple zval pointers declared for each operand we use. They are needed during the test, if an operand needs release. Then we declare four zval. op1
variables zval. op1
zval. op1
and op2
pointers to these zval
's, they contain operand values. We declare the result
variable to store the results of the opcode operation. Finally, we declare tmp
to store the intermediate value of a looping operation in a range (range looping operation). This value will be copied to the hash table at each iteration.op1
and op2
initialized with the pseudo- GET_OP1_ZVAL_PTR_DEREF
and GET_OP2_ZVAL_PTR_DEREF
. Also, these macros are responsible for initializing the variables free_op1 and free_op2 . The constant BP_VAR_R
passed to the above macros is a type flag. Its name stands for BackPatching Variable Read , which is used when reading compiled variables . And in the end we dereference opline
and assign result
its value for further use.if
, provided that min
and max
are integers: zend_long min = Z_LVAL_P(op1), max = Z_LVAL_P(op2); zend_ulong size, i; if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); HANDLE_EXCEPTION(); } // calculate size (one less than the total size for an inclusive range) size = max - min; // the size cannot be greater than or equal to HT_MAX_SIZE // HT_MAX_SIZE - 1 takes into account the inclusive range size if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); HANDLE_EXCEPTION(); } // increment the size to take into account the inclusive range ++size; // set the zval type to be a long Z_TYPE_INFO(tmp) = IS_LONG; // initialise the array to a given size array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_LVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
min
and max
. They are declared as zend_long
, which should be used when declaring long integers (just like zend_ulong
used to define long integers without a sign). The size is then declared using zend_ulong
, which contains the size of the array to be generated.min > max
, an Error
exception is thrown. If you pass Null
as the first argument in zend_throw_error
, the default exception class is Error
. With inheritance, you can fine-tune this exception by creating a new class entry in Zend / zend_exceptions.c . But we will talk more about this another time. If this exception occurs, we call the pseudo- HANDLE_EXCEPTION
, which proceeds to the execution of the next opcode.min = ZEND_LONG_MIN (PHP_INT_MIN)
and max = ZEND_LONG_MAX (PHP_INT_MAX)
.HT_MAX_SIZE
to make sure that the array HT_MAX_SIZE
into the hash table. The total size of the array must not be greater than or equal to HT_MAX_SIZE
. Otherwise, we again generate an Error
exception and exit the virtual machine.HT_MAX_SIZE = INT_MAX + 1
. If the resulting value is greater than size
, then we can increase the latter without fear of overflow. This is what we take as the next step so that the size
value matches the size of the range.tmp IS_LONG
. Then, using the macro array_init_size
initialize result
. This macro assigns the result'
type IS_ARRAY_EX
, allocates memory for the zend_array
structure (hash table), and sets up the corresponding hash table. Then, the zend_hash_real_init
function allocates memory for the Bucket structures containing each element of the array. The second argument, 1, indicates that we want to make it a packed hash table (packed hashtable).Retreat . A packed hash table is essentially an actual array, that is, an array that is accessed using integer keys (as opposed to typical associative arrays in PHP). This optimization was implemented in PHP 7. The reason for this innovation is that in PHP, many arrays are indexed with integers (keys in ascending order). Packed hash tables provide direct access to the hash table pool. If you are interested in the details of the new implementation of hash tables, then refer to the article by Nikita .
Retreat . The_zend_array
structure has twozend_array
:zend_array
andHashTable
.
ZEND_HASH_FILL_PACKED
( definition ), which essentially keeps track of the current bucket for later insertion. During array generation, the intermediate result (array element) is stored in zval'e tmp
. The macro ZEND_HASH_FILL_ADD
creates a copy of tmp
, inserts it into the current bucket of the hash table, and proceeds to the next bucket for the next iteration.ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION
macro (appeared in ZE 3 as a replacement for the individual CHECK_EXCEPTION()
and ZEND_VM_NEXT_OPCODE()
calls embedded in ZE 2) checks if an exception has occurred. It did not occur, and the virtual machine moves to the next opcode.else if
block: long double min, max, size, i; if (Z_TYPE_P(op1) == IS_LONG) { min = (long double) Z_LVAL_P(op1); max = (long double) Z_DVAL_P(op2); } else if (Z_TYPE_P(op2) == IS_LONG) { min = (long double) Z_DVAL_P(op1); max = (long double) Z_LVAL_P(op2); } else { min = (long double) Z_DVAL_P(op1); max = (long double) Z_DVAL_P(op2); } if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); HANDLE_EXCEPTION(); } size = max - min; if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); HANDLE_EXCEPTION(); } // we cast the size to an integer to get rid of the decimal places, // since we only care about whole number sizes size = (int) size + 1; Z_TYPE_INFO(tmp) = IS_DOUBLE; array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_DVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
Retreat . We uselong double
in cases where simultaneous use of integer operands and floating point is possible. The fact is thatdouble
precision is only 53 bits, so when using this type, any integer greater than 2 53 will not be accurately represented. And along double
accuracy of at least 64 bits, so that it allows you to accurately use 64-bit integers.
Z_DVAL_P
,IS_DOUBLE tmp
,Z_DVAL
.min
, max
, or both are neither integer nor floating point. As stated in the second paragraph of the semantics of our range operator, only integer and floating point are supported as operands. In all other cases, the exception Error
should be thrown. Let's insert the following code in the else
block: zend_throw_error(NULL, "Unsupported operand types - only ints and floats are supported"); HANDLE_EXCEPTION();
var_dump(1 |> 1.5); var_dump(PHP_INT_MIN |> PHP_INT_MIN + 1);
array(1) { [0]=> float(1) } array(2) { [0]=> int(-9223372036854775808) [1]=> int(-9223372036854775807) }
assert()
: assert(1 |> 2); // segfaults
. assert()
pretty printer , . , ( pretty printer ). , PHP 7.
ZEND_AST_RANGE
. ( 520- ), 170 ( zend_language_parser.y): * 170 non-associative == != === !== |>
ZEND_AST_RANGE
zend_ast_export_ex
case
( case ZEND_AST_GREATER
): case ZEND_AST_RANGE: BINARY_OP(" |> ", 170, 171, 171); case ZEND_AST_GREATER: BINARY_OP(" > ", 180, 181, 181); case ZEND_AST_GREATER_EQUAL: BINARY_OP(" >= ", 180, 181, 181);
assert()
: assert(false && 1 |> 2); // Warning: assert(): assert(false && 1 |> 2) failed...
Source: https://habr.com/ru/post/276331/
All Articles