📜 ⬆️ ⬇️

About the intricacies of foreach in PHP

In a recent digest of interesting links about PHP, I found a link to Nikita Popov's comment on StackOverflow, where he talks in detail about the “under the hood” mechanism of the foreach control construct.
Since foreach does indeed sometimes work in more than strange ways, I found it useful to translate this answer.

Attention: this text implies a basic knowledge of the functionality of zval's in PHP, in particular, you should know what refcount and is_ref are.
foreach works with entities of different types: with arrays, with simple objects (which lists the available properties) and with Traversable objects (or rather, objects that have an internal get_iterator handler). Here we mostly talk about arrays, but I’ll also talk about the rest at the very end.

Before you begin, a couple of words about arrays and their traversal are important for understanding the context.
')

How arrays work



Arrays in PHP are ordered hash tables (the elements of the hash are combined into a doubly linked list) and foreach bypasses the array, following the specified order.

PHP includes two ways to traverse an array:


Thus, external array pointers can only be used when you are completely confident that no custom code will be executed when traversing. And such code may be in the most unexpected place, such as an error handler or destructor. That is why in most cases PHP has to use an internal pointer instead of an external one. If it were otherwise, PHP might have dropped due to the segmentation fault as soon as the user starts doing something unusual.

The problem with the internal pointer is that it is part of the HashTable. So when you change it, the HashTable changes with it. And since accessing arrays in PHP is done by value (and not by reference), you have to copy the array in order to bypass its elements in a loop.

A simple example showing the importance of copying (by the way, not such a rarity) is an embedded iteration:

foreach ($array as $a) { foreach ($array as $b) { // ... } } 


Here you want both cycles to be independent, rather than slyly thrown by one pointer.

So, we have reached foreach.

Foreach array traversal



Now you know why foreach has to create a copy of the array before bypassing it. But this is clearly not the whole story. Whether PHP makes a copy or not depends on several factors:



So this is the first part of the mystery: the copy function. The second part is how the current iteration is performed, and it is also rather strange. An “ordinary” iteration pattern that you already know (and which is often used in PHP — separate from foreach) looks like this (pseudo-code):

 reset(); while (get_current_data(&data) == SUCCESS) { code(); move_forward(); } 

foreach iteration looks a little different:

 reset(); while (get_current_data(&data) == SUCCESS) { move_forward(); code(); } 


The difference is that move_forward () is executed at the beginning and not at the end of the loop. Thus, when the user code uses the element $ i, the internal array pointer already points to the element $ i + 1.

This foreach mode of operation is also the reason why the internal pointer of the array goes to the next element if the current one is deleted, and not to the previous one (as you might expect). Everything is done so as to work perfectly with foreach (but, obviously, everything else will not work so well, missing items).

Implications for code



The first consequence of the above behavior is that foreach copies the array being iterated in many cases (slowly). But cast off the fear: I tried to remove the copying requirement and could not see the acceleration of work anywhere, except for artificial benchmarks (in which the iteration occurred twice as fast). It seems that people just do not iterate quite a lot.

The second consequence is that there usually should be no other effects. The behavior of foreach is, in general, quite understandable to the user and simply works as it should. You should not be worried about how copying takes place (and whether it happens at all), and at what particular point in time the pointer moves.

And the third consequence - and here we are just approaching your problems - is that sometimes we see very strange behavior that is difficult to understand. This happens specifically when you try to modify the array itself, which you bypass in the loop.

A large collection of behaviors in borderline cases that appear when you modify an array during an iteration can be found in PHP tests. You can start with this test , then change 012 to 013 in the address, and so on. You will see how the foreach behavior will manifest itself in different situations (all sorts of link combinations, etc.).

Now back to your examples:

 foreach ($array as $item) { echo "$item\n"; $array[] = $item; } print_r($array); /* Output in loop: 1 2 3 4 5 $array after loop: 1 2 3 4 5 1 2 3 4 5 */ 


Here, $ array has refcount = 1 before the loop, so it will not be copied, but will get addref. Once you assign the value of $ array [], the zval will be split, so the array to which you add elements and the array to be iterated will be two different arrays.

 foreach ($array as $key => $item) { $array[$key + 1] = $item + 2; echo "$item\n"; } print_r($array); /* Output in loop: 1 2 3 4 5 $array after loop: 1 3 4 5 6 7 */ 


The same situation as in the first test.

 //    ,  ,      foreach var_dump(each($array)); foreach ($array as $item) { echo "$item\n"; } var_dump(each($array)); /* Output array(4) { [1]=> int(1) ["value"]=> int(1) [0]=> int(0) ["key"]=> int(0) } 1 2 3 4 5 bool(false) */ 


Again the same story. During the foreach loop, you have refcount = 1 and you only get addref, the internal pointer $ array will be changed. At the end of the loop, the pointer becomes NULL (this means that the iteration is completed). each demonstrates this by returning false.

 foreach ($array as $key => $item) { echo "$item\n"; each($array); } /* Output: 1 2 3 4 5 */ 


 foreach ($array as $key => $item) { echo "$item\n"; reset($array); } /* Output: 1 2 3 4 5 */ 


The functions each and reset are both referenced. $ array has refcount = 2 when it comes to them, as a result of which it should be divided. Again, foreach will work on a separate array.

But these examples are not convincing enough. Behavior starts to be truly unpredictable when you use current in a loop:

 foreach ($array as $val) { var_dump(current($array)); } /* Output: 2 2 2 2 2 */ 


Here you should keep in mind that current is also referred to by reference, despite the fact that it does not modify the array. This is necessary in order to work consistently with all other functions, like next, which are accessed by reference (current, in fact, preferably the ref function; it can get a value, but use the reference if it can). The link means that the array must be separated, therefore $ array and the copy of $ array that foreach uses will be independent. Why you get 2, not 1, is also mentioned above: foreach increments the array pointer before the beginning of the user code , and not after. So, even if the code still works with the first element, foreach has already moved the pointer to the second.

Now try to make a small change:

 $ref = &$array; foreach ($array as $val) { var_dump(current($array)); } /* Output: 2 3 4 5 false */ 


Here we have is_ref = 1, so the array is not copied (as above). But now when there is an is_ref, the array no longer needs to be split, passing by reference to current. Now current and foreach work with one array. You see an array of just one because of the way foreach handles the pointer.

You will see the same thing when you do a link traversal:

 foreach ($array as &$val) { var_dump(current($array)); } /* Output: 2 3 4 5 false */ 


The most important thing here is that foreach will assign our $ array is_ref = 1 when it will loop around it by reference, so it’s the same as above.

Another small variation, here we assign our array to another variable:

 $foo = $array; foreach ($array as $val) { var_dump(current($array)); } /* Output: 1 1 1 1 1 */ 


Here, the refcount of the $ array array takes the value 2 when the loop has started, so you need to make a copy before starting. Thus, $ array and the array used by foreach will be different from the very beginning. That is why you get the position of the internal array pointer that was relevant before the cycle began (in this case, it was in the first position).

Object Iteration



When iterating objects, it makes sense to consider two cases:

The object is not Traversable (or rather, the internal get_iterator handler is not defined)


In this case, the iteration is almost the same as for arrays. Same copy semantics. The only difference is: foreach will run some additional code to skip properties that are not available in the current scope. A couple more interesting facts:



Traversable object


In this case, everything that is said above will not be applied in any way. Also, PHP will not copy and will not apply any tricks like increasing the pointer until the loop passes. I think that the mode of passage on a traversable object is much more predictable and does not require further description.

Replacing an object to be iterated during a cycle



Another unusual case that I did not mention is that PHP allows for the possibility of replacing an object to be iterated during a cycle. You can start with one array and continue, replacing it halfway with another. Or start with an array, then replace it with an object:

 $arr = [1, 2, 3, 4, 5]; $obj = (object) [6, 7, 8, 9, 10]; $ref =& $arr; foreach ($ref as $val) { echo "$val\n"; if ($val == 3) { $ref = $obj; } } /* Output: 1 2 3 6 7 8 9 10 */ 


As you can see, PHP just started to bypass another entity as soon as the replacement occurred.

Changing the internal array pointer during iteration



The last detail of foreach behavior that I didn’t mention (because it can be used to get truly weird behavior ): what can happen if you try to change the internal array pointer during the loop pass.

Here you can get not what you expected: if you call next or prev in the body of the loop (in the case of a link), you will see that the internal pointer has moved, but this did not affect the behavior of the iterator. The reason is that foreach makes a backup of the current position and hash of the current element in the HashPointer after each pass of the loop. On the next pass, foreach will check if the position of the internal pointer has changed and will try to restore it using this hash.

Let's see what it means to "try." The first example shows how changing the internal pointer does not change the foreach mode:

 $array = [1, 2, 3, 4, 5]; $ref =& $array; foreach ($array as $value) { var_dump($value); reset($array); } // output: 1, 2, 3, 4, 5 


Now let's try to unset the element that the foreach will address on the first pass (key 1):

 $array = [1, 2, 3, 4, 5]; $ref =& $array; foreach ($array as $value) { var_dump($value); unset($array[1]); reset($array); } // output: 1, 1, 3, 4, 5 


Here you will see that the counter is reset, because it was not possible to find an element with a suitable hash.

Keep in mind that a hash is just a hash. There are collisions. Now let's try this:

 $array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3]; $ref =& $array; foreach ($array as $value) { unset($array['EzFY']); $array['FYFZ'] = 4; reset($array); var_dump($value); } // output: 1 1 3 4 


Works as we expected. We deleted the EzFY key (the one where the foreach was), so a reset was made. We also added an additional key, so at the end we see 4.

And here comes the unknown. What happens if you replace the FYFY key with FYFZ? Let's try:

 $array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3]; $ref =& $array; foreach ($array as $value) { unset($array['EzFY']); $array['FYFY'] = 4; reset($array); var_dump($value); } // output: 1 4 


Now the cycle has moved directly to the new element, skipping everything else. This is because the FYFY key has a collision with EzFY (in fact, all the keys from this array too). More than this, the FYFY element is located at the same address in memory as the EzFY element that has just been deleted. So for PHP it will be the same position with the same hash. The position is “restored” and the transition to the end of the array occurs.

Source: https://habr.com/ru/post/172647/


All Articles