📜 ⬆️ ⬇️

PHP 7 Virtual Machine

Good day to all! My name is Konstantin, I work in Badoo in the Features Team. Most likely, you already know that our backend is written in PHP and serves more than three hundred million users. So I could not miss the chance to translate this article from PHP developer Nikita Popov. I am sure it will be useful for developers of all levels, but it may seem complicated for beginners. Have a nice (and helpful) reading!



This article provides an overview of the Zend virtual machine for PHP 7. This is not an exhaustive description, but I will try to cover most of the important parts, as well as some details.
')
The description is made based on PHP version 7.2 (currently under development), but almost everything is true for PHP 7.0 / 7.1. However, the differences from the PHP 5.x series virtual machines are significant, and I, as a rule, did not draw parallels with them.

Most of the article deals with things at the instruction listing level, and only a few sections at the end relate to the level of the actual implementation of the virtual machine in C. However, I want to provide links to the main files that make up the virtual machine:


Opcodes (Opcodes)


In the beginning was the opcode. Speaking of "opcode", we refer to the instruction of the virtual machine as a whole (including operands), but it can also mean only an "actual" opcode, which is a small integer number that defines the type of instruction. The intended meaning should be clear from the context. In the source code, the full instructions are usually called oplines.
A separate instruction corresponds to the following zend_op structure:
struct _zend_op { const void *handler; znode_op op1; znode_op op2; znode_op result; uint32_t extended_value; uint32_t lineno; zend_uchar opcode; zend_uchar op1_type; zend_uchar op2_type; zend_uchar result_type; }; 

Thus, opcodes are essentially an instruction in the format of a "three-address code." There is an opcode that determines the type of instruction, there are two input operands op1 and op2 and one output operand result .

Not all instructions use all operands. The ADD instruction (representing the + operator) will use all three. The BOOL_NOT instruction (representing the operator ! ) Uses only op1 and result. The ECHO instruction uses only op1. Some instructions may or may not use the operand. For example, DO_FCALL may or may not have a result operand (depending on whether the return value of the function call is used). Some instructions require more than two input operands, and in this case for additional operands they will simply use the second auxiliary instruction ( OP_DATA ).

Next to these three standard operands, there is an additional extended_value numeric field that can be used to store additional instruction modifiers. For example, for CAST, it may contain the target type to which you want to cast.

Each operand has a type stored in op1_type , op2_type, and result_type, respectively. Possible types: IS_UNUSED , IS_CONST , IS_TMPVAR , IS_VAR and IS_CV .

The last three types are for operand variables (with three different types of virtual machine variables), IS_CONST denotes a constant operand ( 5 , or "string" , or even [1, 2, 3] ), while IS_UNUSED denotes an operand, which is either not actually used or used as a 32-bit numeric value (the so-called direct operand). For example, a jump instruction will store the jump address in the UNUSED operand.

Receiving dump opcodes


In the future, I will often demonstrate fragments of the opcode that PHP generates. Currently there are three ways to get such opcode dumps:
 # Opcache, since PHP 7.1 php -d opcache.opt_debug_level=0x10000 test.php # phpdbg, since PHP 5.6 phpdbg -p* test.php # vld, third-party extension php -d vld.active=1 test.php 

Of these, opcache provides the best result. The listings used in this article are based on opcache dumps, with minor syntax adjustments. The magic number 0x10000 is an abbreviation "before optimization", so we see opcodes as the PHP compiler created them. 0x200000 will give you optimized opcodes. Opcache can also generate much more information. For example, 0x40000 will generate a CFG, and 0x200000 will generate an SSA . But let's not get ahead of events: for our purposes, the rather common old linearized opcode dumps.

Variable Types


Probably one of the most important points to consider when working with a PHP virtual machine is the use of three different types of variables. In PHP 5, TMPVAR, VAR and CV had very different views on the virtual machine stack, and the ways to access them were also very different. In PHP 7, they became very similar because they use the same storage engine. However, there are important differences in the meanings they may contain and in their semantics.

CV is short for “compiled variable”. It refers to the "real" PHP variable. If the function uses the $ a variable, then it will have the corresponding CV for it.

CV variables can be of type UNDEF to denote undefined variables. If the instruction uses a UNDEF CV, in most cases it gives the well-known notification “undefined variable” (undefined variable). At the input of the function, all non-argument CVs are initialized as UNDEF.

CV variables are not destroyed by instructions. For example, the instruction ADD $ a, $ b does not destroy the values ​​stored in the variables $ a and $ b . Instead, all CV variables are destroyed at the same time when they go out of scope. This also implies that all CV variables contain valid values ​​throughout the duration of the function.

The variables TMPVAR and VAR, in turn, are temporary variables of the virtual machine. They are usually entered as an operand of the result of some operation. For example, the code $ a = $ b + $ c + $ d will result in an opcode similar to the following:
 T0 = ADD $b, $c T1 = ADD T0, $d ASSIGN $a, T1 

TMP / VAR variables are always defined before use and as such cannot contain a UNDEF value. Unlike CV, these types of variables are destroyed by the instructions in which they are used. In the example above, the second ADD will destroy the value of the operand T0, and after this point T0 should not be used anymore. Similarly, ASSIGN will destroy the value of T1, making the variable T1 invalid.

From this it follows that the variables TMP / VAR are usually very short-lived. In most cases, they live only within one instruction. Outside this short interval, the values ​​in them are rubbish.

So what is the difference between TMP and VAR variables? They are few. The difference was inherited from PHP 5, where TMP was placed on the virtual machine stack, and VAR - on the heap. In PHP 7, all variables are placed on the stack. Thus, at present, the main difference between TMP and VAR is that only the latter is allowed to contain references (this allows us to exclude dereferencing (DEREF) of TMP variables). In addition, VARs can contain two types of special values, namely class entries and INDIRECT values. The latter are used to process nontrivial assignments.

This table shows the main differences of the variables:
UNDEFRefINDIRECTConsumed?Named?
CVyesyesnonoyes
TMPVARnononoyesno
Varnoyesyesyesno

Op-arrays


All PHP functions are represented as structures having a common zend_function header. The concept of “function” here is interpreted somewhat more broadly and includes everything from “real” functions and methods to autonomous pseudo-main code and eval code.

The user functions use the zend_op_array structure. It has more than 30 fields, so I will start with its smaller version:
 struct _zend_ { /* Common zend_function header here */ /* ... */ uint32_t last; zend_op *opcodes; int last_var; uint32_t T; zend_string **vars; /* ... */ int last_literal; zval *literals; /* ... */ }; 

The most important part here is, of course, the opcodes , which are an array of opcodes (instructions). last is the number of opcodes in this array. Note that the terminology is somewhat confusing, since last sounds like it should be the index of the last opcode, while in fact it is the number of opcodes (one more than the last index). The same applies to all other last_ * values ​​in the op_array structure.

last_var is the number of CVs, and T is the number of TMP and VAR (in most cases we do not distinguish between them). vars is an array of names for CV.

literals are an array of literals occurring in code, something that the CONST operands refer to . Depending on the ABI, each CONST operand will either contain a pointer to an element of this literal table, or store an offset from its beginning.

There is something else in this structure, but this can be postponed.

Stack frame layout


With the exception of executor globals (EG), the entire execution state is stored on the virtual machine stack. The VM stack is distributed on 256KiB pages, and individual pages are linked through a linked list.

Each time a function is called, a new stack frame is allocated in the virtual machine stack, which has the following scheme:
 +----------------------------------------+ | zend_execute_data | +----------------------------------------+ | VAR[0] = ARG[1] | arguments | ... | | VAR[num_args-1] = ARG[N] | | VAR[num_args] = CV[num_args] | remaining CVs | ... | | VAR[last_var-1] = CV[last_var-1] | | VAR[last_var] = TMP[0] | TMP/VARs | ... | | VAR[last_var+T-1] = TMP[T] | | ARG[N+1] (extra_args) | extra arguments | ... | +----------------------------------------+ 

A frame begins with a zend_execute_data structure, followed by an array of variable slots. The slots are all the same (simple zval), but they are used for different purposes. The first last_var slots are CVs, of which the first num_args contains the function arguments. The CV slots are followed by T- slots for TMP / VAR. Finally, sometimes there may be additional arguments stored at the end of the frame. They are used for func_get_args () .

The CV and TMP / VAR operands in the instructions are encoded as offsets relative to the beginning of the stack frame, so sampling a particular variable is a simple reading from the cell at the address execute_data plus the specified offset.

The data at the beginning of the frame is defined as follows:
 struct _zend_execute_data { const zend_op *opline; zend_execute_data *call; zval *return_value; zend_function *func; zval This; /* this + call_info + num_args */ zend_class_entry *called_scope; zend_execute_data *prev_execute_data; zend_array *symbol_table; void **run_time_cache; /* cache op_array->run_time_cache */ zval *literals; /* cache op_array->literals */ }; 

Most importantly, this structure contains the opline , which is the instruction currently being executed, and func , which is the function that is currently being executed. Moreover:


Function calls


I missed one field in the execute_data structure, namely, call , since it requires some additional explanation of how function calls work.

All calls use the same sequence of instructions. var_dump ($ a, $ b) in the global scope is compiled into:
 INIT_FCALL (2 args) "var_dump" SEND_VAR $a SEND_VAR $b V0 = DO_ICALL #   DO_ICALL  retval   

There are eight different types of INIT instructions (whichever is the call). INIT_FCALL is used for function calls (not being class methods) that we recognize at compile time. Similarly, there are ten different SEND opcodes (depending on the type of arguments and function). There is only a small number of the four DO_CALL opcodes, where ICALL is used to call internal functions.

Although specific instructions may differ, the structure is always the same: INIT, SEND, DO. The main problem that a sequence of calls should handle is nested function calls that compile something like this:
 # var_dump(foo($a), bar($b)) INIT_FCALL (2 args) "var_dump" INIT_FCALL (1 arg) "foo" SEND_VAR $a V0 = DO_UCALL SEND_VAR V0 INIT_FCALL (1 arg) "bar" SEND_VAR $b V1 = DO_UCALL SEND_VAR V1 V2 = DO_ICALL 

I formatted the opcode sequence to visualize which instructions correspond to which call.

The INIT opcode pushes a call frame onto the stack that contains enough space for all the variables and function arguments that we know about (if decompressing the arguments is involved, we can end up with more arguments). This call block is initialized by the called function, $ this and called_scope (in this case both of the latter will be NULL, since we call functions).

A pointer to the new frame is stored in execute_data-> call , where execute_data is the frame of the calling function. In the future, we will denote this as EX (call) . It is noteworthy that the prev_execute_data of the new frame is set to the old value EX (call) . For example, INIT_FCALL to call foo will write the var_dump stack frame to prev_execute_data . Thus, prev_execute_data in this case forms a linked list of "unfinished" calls, while usually it provides a backtrace chain.

Then the SEND opcodes go on to passing arguments to the slots of EX (call) variables. At this stage, all arguments are consecutive and can flow from the section for arguments to other CVs or TMPs. This will be fixed later.

Finally, DO_FCALL makes the actual call. What was EX (call) becomes the current function, and prev_execute_data changes to the calling function. In addition, the calling procedure depends on which function it is. Internal functions need only call the handler function, while user functions must complete the initialization of the stack frame.

This initialization involves arranging the stack of arguments. PHP allows you to pass more arguments to functions than it expects (and func_get_args relies on this). However, only actual arguments have corresponding CVs. Any other arguments will be written to memory reserved for other CVs and TMPs. Essentially, these arguments will be placed after the TMP, with the result that the arguments will be divided into two separate fragments.

It is necessary to clarify that calls to user-defined functions do not imply recursion at the virtual machine level. They only mean switching from one execute_data to another, but the VM continues to operate in a linear loop. Recursive calls to the virtual machine occur only if the internal functions call custom callbacks (for example, via array_map). For this reason, infinite recursion in PHP is usually interrupted due to a lack of memory or an OOM error, but a recursion stack overflow through callbacks or magic methods can be caused.

Passing arguments


To pass arguments, PHP uses a large number of opcodes, the differences between which can be confusing due to their unfortunate naming.

SEND_VAL and SEND_VAR are the simplest variants that pass arguments by value when the value is known at compile time. SEND_VAL is used for CONST and TMP operands, and SEND_VAR is used for VAR and CV.

SEND_REF, in contrast, is used for arguments that are known at compile time to be references. Since only pointers can be passed by reference, this opcode accepts only VAR and CV.

SEND_VAL_EX and SEND_VAR_EX - options SEND_VAL / SEND_VAR for cases where we can not statically determine whether the argument is passed by value or by reference. These opcodes check the type of the argument based on arginfo and behave accordingly. In most cases, it is not the arginfo structure that is actually used, but a rather compact bit vector directly in the function structure.

And there is SEND_VAR_NO_REF_EX. Do not try to understand anything of his name - this is an outright lie. This opcode is used when passing something that is not actually a variable, but returns a VAR as a statically unknown argument. Two specific examples in which it is used are the transfer of the result of a function call as an argument and the transfer of the result of an assignment.

This case requires a separate opcode for two reasons: first, it will create the familiar message “Only variables must be passed by reference”, if you try to transfer something like assignment by reference (if SEND_VAR_EX were used, it would silently allow). Secondly, this opcode deals with the case when you may need to transfer the result of a function call to a link without raising any exceptions. The variant of this opcode SEND_VAR_NO_REF (without _EX) is a specialized option for the case when we statically know what the link is expected, but don’t know if the argument is one.

The SEND_UNPACK and SEND_ARRAY opcodes deal with decompressing arguments and nested calls to call_user_func_array, respectively. Both of them retrieve elements from the array and put them on the stack of arguments and differ in different details (for example, unpacking supports Traversables, but call_user_func_array does not). If decompression is used, it may be necessary to increase the stack frame (since the actual number of function arguments is unknown during initialization). In most cases, this increase can occur simply by moving the pointer to the top of the stack. However, if the page boundary of the stack is crossed, the new page must be selected, and the entire call frame (including arguments already placed on the stack) must be copied to the new page (we cannot process the call frame that crosses the page border).

The last opcode, SEND_USER, is used for internal calls to call_user_func and deals with some of its features.

Although we have not yet discussed the various modes of obtaining data from variables, it is time to introduce the FUNC_ARG mode. Consider a simple call of the type func ($ a [0] [1] [2]) , for which we do not know at compile time, the argument will be passed by value or by reference. In these cases, the behavior will be very different. If passed by value, and $ a is empty, it can create a bunch of "undefined index" notifications. If passing by reference, we must silently initialize the nested arrays.

The FUNC_ARG data acquisition mode dynamically selects one of two behaviors (R or W), checking the arginfo of the current EX (call) function. For example, the func ($ a [0] [1] [2]) sequence of opcodes might look something like this:
 INIT_FCALL_BY_NAME "func" V0 = FETCH_DIM_FUNC_ARG (arg 1) $a, 0 V1 = FETCH_DIM_FUNC_ARG (arg 1) V0, 1 V2 = FETCH_DIM_FUNC_ARG (arg 1) V1, 2 SEND_VAR_EX V2 DO_FCALL 

Fetch modes


The PHP virtual machine has four classes of opcodes for receiving data:
 FETCH_ * // $ _GET, $$ var
 FETCH_DIM_ * // $ arr [0]
 FETCH_OBJ_ * // $ obj-> prop
 FETCH_STATIC_PROP_ * // A :: $ prop

They do exactly what one would expect from them, with the observation that the basic FETCH_ * variant is used only for accessing variables of variables ($$ var) and superglobal variables: the usual calls to variables instead occur through a faster CV mechanism .

These opcodes for data acquisition are presented in six variants:
 _R
 _RW
 _W
 _IS
 _UNSET
 _FUNC_ARG

We have already learned that _FUNC_ARG chooses between _R and _W depending on how the function argument is passed, by value or by reference. Let's try to create some situations when we expect different FETCH_ * variants to appear:
 // $arr[0]; V2 = FETCH_DIM_R $arr int(0) FREE V2 // $arr[0] = $val; ASSIGN_DIM $arr int(0) OP_DATA $val // $arr[0] += 1; ASSIGN_ADD (dim) $arr int(0) OP_DATA int(1) // isset($arr[0]); T5 = ISSET_ISEMPTY_DIM_OBJ (isset) $arr int(0) FREE T5 // unset($arr[0]); UNSET_DIM $arr int(0) 


Unfortunately, the actual retrieval by index occurs only in the case of FETCH_DIM_R. . , ASSIGN_DIM ASSIGN_ADD OP_DATA, . , ASSIGN_DIM, - FETCH_DIM_W + ASSIGN, ( ) , (, ASSIGN_DIM , ArrayAccess :: offsetSet ()). , :
 // $arr[0][1]; V2 = FETCH_DIM_R $arr int(0) V3 = FETCH_DIM_R V2 int(1) FREE V3 // $arr[0][1] = $val; V4 = FETCH_DIM_W $arr int(0) ASSIGN_DIM V4 int(1) OP_DATA $val // $arr[0][1] += 1; V6 = FETCH_DIM_RW $arr int(0) ASSIGN_ADD (dim) V6 int(1) OP_DATA int(1) // isset($arr[0][1]); V8 = FETCH_DIM_IS $arr int(0) T9 = ISSET_ISEMPTY_DIM_OBJ (isset) V8 int(1) FREE T9 // unset($arr[0][1]); V10 = FETCH_DIM_UNSET $arr int(0) UNSET_DIM V10 int(1) 


, , FETCH fetch mode. , «Undefined offset», , :
Notice?
Write?
Ryesno
Wnoyes
Rwyesyes
ISnono
UNSETnoyes-ish

UNSET , . - (write-fetch) .


Write fetches VAR, zval, INDIRECT- zval. , , zval, , VM. PHP [][0] = 42 , , call()[0] = 42 . , call() , .

– fetch INDIRECT, (, -). , : , «» . , , INDIRECT, .

:
 $arr[a()][b()] = c(); 

:
 INIT_FCALL_BY_NAME (0 args) "a" V1 = DO_FCALL_BY_NAME INIT_FCALL_BY_NAME (0 args) "b" V3 = DO_FCALL_BY_NAME INIT_FCALL_BY_NAME (0 args) "c" V5 = DO_FCALL_BY_NAME V2 = FETCH_DIM_W $arr V1 ASSIGN_DIM V2 V3 OP_DATA V5 

, - ( FETCH_DIM_W « opline»). , write-fetch , fetch, .

:
 $arr[0] =& $arr[1]; 

: . $arr[0] , $arr[1] , . :
 V2 = FETCH_DIM_W $arr 1 V3 = MAKE_REF V2 V1 = FETCH_DIM_W $arr 0 ASSIGN_REF V1 V3 

$arr[1] , , MAKE_REF. MAKE_REF INDIRECT , $arr[0] .


— .

EG(exception) , EG executor globals. C ; EG(exception) . , .

VM . , « » , . EG(exception) VM. :

, opline execute data opline HANDLE_EXCEPTION ( op array, ). Opline, , EG(opline_before_exception) . , , , HANDLE_EXCEPTION. : , : ) opline, execute data, opline ( opline_before_exception ); ) opline execute data ( HANDLE_EXCEPTION ).

, . , opline, opline, execute data. PHP 7 GOTO SWITCH, PHP 7 : , opline .

, , , opline execute data ( SAVE_OPLINE). opline execute data ( CHECK_EXCEPTION).

HANDLE_EXCEPTION , . ? , , try. op array try_catch_elements, opline try, catch finally:
 typedef struct _zend_try_catch_element { uint32_t try_op; uint32_t catch_op; /* ketchup! */ uint32_t finally_op; uint32_t finally_end; } zend_try_catch_element; 

, finally , . , try, , , try.

, . , . :
 # (array)[] + throwing() L0: T0 = CAST (array) [] L1: INIT_FCALL (0 args) "throwing" L2: V1 = DO_FCALL L3: T2 = ADD T0, V1 

T0 L1 L2 , .

, , — . For example:
 # foreach ($array as $value) throw $ex; L0: V0 = FE_RESET_R $array, ->L4 L1: FE_FETCH_R V0, $value, ->L4 L2: THROW $ex L3: JMP ->L1 L4: FE_FREE V0 

V0 L1 L3 ( ). op array, :
 typedef struct _zend_live_range { uint32_t var; /* low bits are used for variable type (ZEND_LIVE_* macros) */ uint32_t start; uint32_t end; } zend_live_range; 

var — , , start — opline ( ), end — opline ( ). , , .

var , :

, , , , . :
 T2 = ADD T0, T1 ASSIGN $v, T2 

ADD, T2 , ADD- ? , ASSIGN , T2 , ASSIGN ? : , .

, PHP 7.1 7.2. PHP 7.1 , PHP 7.2 ( , ). — , ( ADD). :
  1. .
  2. , .
  3. ( ).

, PHP , , ( , ). , 3 , . , .

, catch. catch ( finally), , .

, , , . , .
:
 foreach (new Dtor as $value) { try { echo "Return"; return; } catch (Exception $e) { echo "Catch"; } } 

, Dtor — Traversable . ( ):
 L0: V0 = NEW 'Dtor', ->L2 L1: DO_FCALL L2: V2 = FE_RESET_R V0, ->L11 L3: FE_FETCH_R V2, $value L4: ECHO 'Return' L5: FE_FREE (free on return) V2 # <- return L6: RETURN null # <- return L7: JMP ->L10 L8: CATCH 'Exception' $e L9: ECHO 'Catch' L10: JMP ->L3 L11: FE_FREE V2 # <- the duplicated instr 

, return FE_FREE- RETURN. , FE_FREE ? Dtor . , try, catch. ! Catch – .

, , FE_FREE try, FE_FREE L11. , . FE_FREE, , FREE_ON_RETURN. . , catch – .

finally


PHP finally . PHP 5.5, - . PHP 5.6, 7.0 7.1 . , . , , PHP 7.1 - ( ).

, finally . , , . , (, , PHP 5 AST).

finally , try, (, return), ( ). , , . :
 try { throw new Exception(); } finally { return 42; } 

What's happening? Finally , 42.

:
 try { return 24; } finally { return 42; } 

finally , 42. Finally .

PHP finally. , :
 foreach ($array as $value) { try { return 42; } finally { continue; } } 

Continue . , – «», , catch:
 foreach ($array as $value) { try { try { return 42; } finally { throw new JumpException; } } catch (JumpException $e) { continue; } } 

, , , finally. , goto finally finally.

, , finally . : FAST_CALL FAST_RET. , FAST_CALL finally, FAST_RET — . :
 try { echo "try"; } finally { echo "finally"; } echo "finished"; 

:
 L0: ECHO string("try") L1: T0 = FAST_CALL ->L3 L2: JMP ->L5 L3: ECHO string("finally") L4: FAST_RET T0 L5: ECHO string("finished") L6: RETURN int(1) 

FAST_CALL T0 finally- L3. FAST_RET, , T0. L2, finally. , ( ).

:
 try { throw new Exception("try"); } catch (Exception $e) { throw new Exception("catch"); } finally { throw new Exception("finally"); } 

try/catch/finally:
  1. try catch: $e catch.
  2. catch try catch, finally-: finally FAST_CALL ( ).
  3. finally: , FAST_CALL, . try/catch/finally.
  4. : try/catch/finally.


: try, catch. Catch , finally FAST_CALL. finally , «finally» «catch», .

:
 try { try { throw new Exception("try"); } finally {} } catch (Exception $e) { try { throw new Exception("catch"); } finally {} } finally { try { throw new Exception("finally"); } finally {} } 

finally , – ( FAST_RET). try/catch/finally. try/catch FAST_RET ( – «try-catch(0)»).

finally . return finally?
 try { throw new Exception("try"); } finally { return 42; } 

:
 L4: T0 = FAST_CALL ->L6 L5: JMP ->L9 L6: DISCARD_EXCEPTION T0 L7: RETURN 42 L8: FAST_RET T0 

DISCARD_EXCEPTION , try (: finally ). return try?
 try { $a = 42; return $a; } finally { ++$a; } 

42, 43, return $a , $a . :
 L0: ASSIGN $a, 42 L1: T3 = QM_ASSIGN $a L2: T1 = FAST_CALL ->L6, T3 L3: RETURN T3 L4: T1 = FAST_CALL ->L6 #  L5: JMP ->L8 #  L6: PRE_INC $a L7: FAST_RET T1 L8: RETURN null 

, return. , . . -, $a T3, QM_ASSIGN ( « »). $a . -, T3 FAST_CALL, T1. return try (, finally throw return), .

, . , Dtor Traversable :
 try { foreach (new Dtor as $v) { try { return 1; } finally { return 2; } } } finally { echo "finally"; } 

:
 L0: V2 = NEW (0 args) "Dtor" L1: DO_FCALL L2: V4 = FE_RESET_R V2 ->L16 L3: FE_FETCH_R V4 $v ->L16 L4: T5 = FAST_CALL ->L10 #  try L5: FE_FREE (free on return) V4 L6: T1 = FAST_CALL ->L19 L7: RETURN 1 L8: T5 = FAST_CALL ->L10 #  L9: JMP ->L15 L10: DISCARD_EXCEPTION T5 #  finally L11: FE_FREE (free on return) V4 L12: T1 = FAST_CALL ->L19 L13: RETURN 2 L14: FAST_RET T5 try-catch(0) L15: JMP ->L3 L16: FE_FREE V4 L17: T1 = FAST_CALL ->L19 L18: JMP ->L21 L19: ECHO "finally" #  finally L20: FAST_RET T1 

return ( try) — FAST_CALL L10, FE_FREE V4, FAST_CALL L19, RETURN. finally, foreach, finally , , . return ( finally) — DISCARD_EXCEPTION T5, FE_FREE V4, FAST_CALL L19. ( – ) try, foreach , , finally. , .

Generators


, , . :
 function gen($x) { foo(yield $x); } 

:
 $x = RECV 1 GENERATOR_CREATE INIT_FCALL_BY_NAME (1 args) string("foo") V1 = YIELD $x SEND_VAR_NO_REF_EX V1 1 DO_FCALL_BY_NAME GENERATOR_RETURN null 


GENERATOR_CREATE . GENERATOR_CREATE Generator , execute_data ( , ), execute_data VM.

, executor execute_data , . , , , . YIELD , foo() VM.

yield .
PHP 7.1. VM- 4 KB, executor . , .

-


. For example:
 L0: T2 = IS_EQUAL $a, $b L1: JMPZ T2 ->L3 L2: ECHO "equal" 

, ( IS_EQUAL) - (smart branch): , JMPZ JMPNZ, , .

, JMPZ/JMPNZ, , . , . , ($a == $b) + ($d? $e: $f) :
 L0: T5 = IS_EQUAL $a, $b L1: NOP L2: JMPZ $d ->L5 L3: T6 = QM_ASSIGN $e L4: JMP ->L6 L5: T6 = QM_ASSIGN $f L6: T7 = ADD T5 T6 L7: FREE T7 

, IS_EQUAL JMPZ NOP. NOP , IS_EQUAL, JMPZ.

-


( ), . - -, . - ( ).

- . — (, INIT_FCALL). , INIT_FCALL ( ), -.

— , -, , — . FETCH_OBJ_R, . ( ), . , .


PHP 7.0 - , - . , . PHP 7.0 - , . () , . PHP 7.1 pcntl- , - .

, , . , . , , .

Specialization


, , :
 ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMPVAR|CV, CONST|TMPVAR|CV) 

1 — , ZEND_ADD — , , . ( zend_vm_gen.php) . , ZEND_ADD_SPEC_CONST_CONST_HANDLER.

. OP1_TYPE OP2_TYPE, , GET_OP1_ZVAL_PTR () FREE_OP1 (), .

ADD , CONST | TMPVAR | CV . TMPVAR , TMP, VAR, , . , TMP VAR , . , ADD ( ), . , , TMP|VAR.

, , , .

ASSIGN_DIM OP_DATA-:
 ZEND_VM_HANDLER(147, ZEND_ASSIGN_DIM, VAR|CV, CONST|TMPVAR|UNUSED|NEXT|CV, SPEC(OP_DATA=CONST|TMP|VAR|CV)) 

2*4*4=32 ASSIGN_DIM.

NEXT . , , UNUSED- : , ( $arr[] ).

:
 ZEND_VM_HANDLER(23, ZEND_ASSIGN_ADD, VAR|UNUSED|THIS|CV, CONST|TMPVAR|UNUSED|NEXT|CV, DIM_OBJ, SPEC(DIM_OBJ)) 

UNUSED- $this . , (, FETCH_OBJ_R UNUSED , ' prop ' $this->prop ). UNUSED- . extended_value: , $a += 1 , $a[$b] += 1 $a->b += 1 . , , SPEC(DIM_OBJ) , . , , . , UNUSED op1 OBJ . .

, , . :
 ZEND_VM_TYPE_SPEC_HANDLER( ZEND_ADD, (res_info == MAY_BE_LONG && op1_info == MAY_BE_LONG && op2_info == MAY_BE_LONG), ZEND_ADD_LONG_NO_OVERFLOW, CONST|TMPVARCV, CONST|TMPVARCV, SPEC(NO_CONST_CONST,COMMUTATIVE) ) 

, , . , , . , , , , int + int -> int . , SPEC ( specializer), , , CONST + TMPVARCV, TMPVARCV + CONST.


/ , , .

- , SL ( ):
 ZEND_VM_HANDLER(6, ZEND_SL, CONST|TMPVAR|CV, CONST|TMPVAR|CV) { USE_OPLINE zend_free_op free_op1, free_op2; zval *op1, *op2; op1 = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_R); op2 = GET_OP2_ZVAL_PTR_UNDEF(BP_VAR_R); if (EXPECTED(Z_TYPE_INFO_P(op1) == IS_LONG) && EXPECTED(Z_TYPE_INFO_P(op2) == IS_LONG) && EXPECTED((zend_ulong)Z_LVAL_P(op2) < SIZEOF_ZEND_LONG * 8)) { ZVAL_LONG(EX_VAR(opline->result.var), Z_LVAL_P(op1) << Z_LVAL_P(op2)); ZEND_VM_NEXT_OPCODE(); } SAVE_OPLINE(); if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(op1) == IS_UNDEF)) { op1 = GET_OP1_UNDEF_CV(op1, BP_VAR_R); } if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(op2) == IS_UNDEF)) { op2 = GET_OP2_UNDEF_CV(op2, BP_VAR_R); } shift_left_function(EX_VAR(opline->result.var), op1, op2); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); } 

GET_OPn_ZVAL_PTR_UNDEF BP_VAR_R. UNDEF , CV , UNDEF . , , , . , . , , UNDEF, GET_OPn_ZVAL_PTR_UNDEF .

, , SAVE_OPLINE (). « ». , . GET_OPn_UNDEF_CV NULL.

shift_left_function, EX_VAR(opline->result.var) . , ( ), ( , opline , ).

, , , , opline . , .


, . C, . , :
OPn_TYPE
OP_DATA_TYPE

GET_OPn_ZVAL_PTR(BP_VAR_*)
GET_OPn_ZVAL_PTR_DEREF(BP_VAR_*)
GET_OPn_ZVAL_PTR_UNDEF(BP_VAR_*)
GET_OPn_ZVAL_PTR_PTR(BP_VAR_*)
GET_OPn_ZVAL_PTR_PTR_UNDEF(BP_VAR_*)
GET_OPn_OBJ_ZVAL_PTR(BP_VAR_*)
GET_OPn_OBJ_ZVAL_PTR_UNDEF(BP_VAR_*)
GET_OPn_OBJ_ZVAL_PTR_DEREF(BP_VAR_*)
GET_OPn_OBJ_ZVAL_PTR_PTR(BP_VAR_*)
GET_OPn_OBJ_ZVAL_PTR_PTR_UNDEF(BP_VAR_*)
GET_OP_DATA_ZVAL_PTR()
GET_OP_DATA_ZVAL_PTR_DEREF()

FREE_OPn()
FREE_OPn_IF_VAR()
FREE_OPn_VAR_PTR()
FREE_UNFETCHED_OPn()
FREE_OP_DATA()
FREE_UNFETCHED_OP_DATA()


, . BP_VAR_* , FETCH_* ( FUNC_ARG).

GET_OPn_ZVAL_PTR() — . CV . GET_OPn_ZVAL_PTR_UNDEF() , , , CV. GET_OPn_ZVAL_PTR_DEREF() DEREF zval. GET, CV VAR, CONST TMP. TMP VAR, TMP|VAR- ( TMPVAR ).

GET_OPn_OBJ_ZVAL_PTR*() UNUSED-. , $this UNUSED-, GET_OPn_OBJ_ZVAL_PTR*() EX(This) UNUSED.

, PTR_PTR . PHP 5, INDIRECT- zval. CV VAR ( NULL). PTR , «-INDIRECT-» VAR.

FREE_OP*() . zend_free_op free_opN , GET . FREE_OPn() TMP VAR, CV CONST. FREE_OPn_IF_VAR() , : , VAR.

FREE_OP*_VAR_PTR() PTR_PTR . VAR INDIRECT.

FREE_UNFETCHED_OP*() , , GET. , .

, . , , :
ZEND_VM_CONTINUE()
ZEND_VM_ENTER()
ZEND_VM_LEAVE()
ZEND_VM_RETURN()

CONTINUE , ENTER LEAVE / . , (, , , ). , , . RETURN VM.

ZEND_VM_CONTINUE () , opline . , , :
Continue?Check exception?Check interrupt?
ZEND_VM_NEXT_OPCODE()yes
no
no
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION()yes
yes
no
ZEND_VM_SET_NEXT_OPCODE(op)no
no
no
ZEND_VM_SET_OPCODE(op)no
no
yes
ZEND_VM_SET_RELATIVE_OPCODE(op, offset)no
no
yes
ZEND_VM_JMP(op)yes
yes
yes


, ZEND_VM_CONTINUE(), VM.

SAVE_OPLINE() , LOAD_OPLINE() HANDLE_EXCEPTION() . , SAVE_OPLINE() . opline, VM ( ), execute data. LOAD_OPLINE() — , , ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION() ZEND_VM_JMP(). HANDLE_EXCEPTION() , , . LOAD_OPLINE CONTINUE.

, ( ...), .

Source: https://habr.com/ru/post/327068/


All Articles