Work with memory (and yet it is)

There is a widespread opinion that the “ordinary” PHP developer practically does not need to worry about memory management, however, “care” and “know” are still slightly different concepts. I will try to highlight some aspects of memory management when working with variables and arrays, as well as interesting "pitfalls" of internal PHP optimization. As you can see, optimization is good, but if you don’t know exactly how it “optimizes”, then you can run into “unobvious rakes” that can make you pretty nervous.

General information

Small educational program

A variable in PHP consists of two parts: the " name ", which is stored in the hash_table symbol_table, and the " value ", which is stored in the zval container.
This mechanism allows you to create several variables referring to one value, which in some cases allows you to optimize memory consumption. How it looks in practice will be written further.

The most frequent elements of the code, without which it is difficult to imagine a less functional script, are the following points:
- creation, assignment and deletion of variables (numbers, strings, etc.),
- creating arrays and traversing them (foreach will be used as an example),
- transfer and return values for functions / methods.
')
The following description will be about these aspects of working with memory. It turned out quite volume, but nothing mega-complex will not be and everything will be quite simple, obviously, and with examples.

The first example of working with memory

First, a basic example of how memory consumption will be analyzed.
For this we need a couple of simple functions (the func.php file):

<? php
function memoryUsage ( $ usage , $ base_memory_usage ) {
printf ( "Bytes diff: % d \ n" , $ usage - $ base_memory_usage ) ;
}
function someBigValue ( ) {
return str_repeat ( 'SOME BIG STRING' , 1024 ) ;
}
?>

And a simple first example of a memory consumption test for a row:

<? php
include ( 'func.php' ) ;
echo "String memory usage test. \ n \ n" ;
$ base_memory_usage = memory_get_usage ( ) ;
$ base_memory_usage = memory_get_usage ( ) ;

echo "Start \ n" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

$ a = someBigValue ( ) ;

echo "String value setted \ n" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a ) ;

echo "String value unsetted \ n" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
?>

Note: the code is undoubtedly non-optimized from the point of view of efficiency, but in this case, the visibility of memory consumption for which this representation is implemented is extremely important to us.

The result of the code is quite obvious:

String memory usage test.

Start
Bytes diff: 0
String value setted
Bytes diff: 15448
String value unsetted
Bytes diff: 0

The same example, but instead of unset ($ a), we use $ a = null; :

Start
Bytes diff: 0
String value setted
Bytes diff: 15448
String value set to null
Bytes diff: 76

As you can see, the variable was not completely destroyed. Another 76 bytes remain allocated for it.
Sufficiently decent, if we consider that exactly the same amount is allocated for variables of type boolean, integer, float. This is not about the amount of memory allocated for the variable value , but about the total memory consumption for storing information about the assigned variable (zval container with the value and the variable name itself).
So if you want to free up memory with an assignment, then it is not a matter of principle to assign exactly null values. The expression $ a = 10,000; will give the same result for memory consumption.

The PHP documentation says that casting to null will destroy the variable and its value , however, this script shows that it is not, which is actually a bug (documentation) .

Why use assignment null if you can unset () ?
Assignment is an assignment, (thanks to KO), that is, the value of a variable changes, respectively, if a new value requires less memory, it is released immediately, but this requires computational resources (albeit relatively few).
unset () in turn frees the memory allocated for the variable name and its value.
It is worth mentioning separately that unset () and assignment of null work quite differently with variable references. Unset () will destroy only the link, while assigning null will change the value referenced by the variable names, respectively, all variables will refer to the value null .

Note:
There is a misconception that unset () is a function, however, this is not true. unset () is a language construct (such as if ), which is explicitly stated in the documentation , so it cannot be used to access through the variable value:

$ unset_func_name = 'unset' ;
$ unset_func_name ( $ some_var ) ;

Some additional information for idle thinking (when changing the example above):
$ a = array ();
allocates 164 bytes, unset ($ a) returns everything.

class A {}
$ a = new A ();
allocates 184 bytes, unset ($ a) returns everything.

$ a = new stdClass ();
will allocate 272 bytes, but after unset ($ a) 88 bytes will “leak” (I have not yet been able to figure out where exactly and why they have leaked).

So far, the examples are not critical in terms of memory consumption, since string and numeric values are quite obviously stored and processed. Everything becomes much worse when arrays are used (objects also have a number of features, but this will require a separate article).

Arrays

Arrays in PHP "eat up" enough memory, and it is in them that usually store significant amounts of data during processing, so you should be very careful about working with them. However, working with arrays in PHP has its own “optimization charm” and one of these moments related to memory consumption is worth mentioning.

Insidious example number 1

< ? php
include ( 'func.php' ) ;
echo "Array memory usage example." ;
$ base_memory_usage = memory_get_usage ( ) ;
$ base_memory_usage = memory_get_usage ( ) ;

echo 'Base usage.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

$ a = array ( someBigValue ( ) , someBigValue ( ) , someBigValue ( ) , someBigValue ( ) ) ;

echo 'Array is set.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

foreach ( $ a as $ k => $ v ) {
$ a [ $ k ] = someBigValue ( ) ;
unset ( $ k, $ v ) ;
echo 'In FOREACH cycle.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
}

echo 'Usage right after FOREACH.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a ) ;
echo 'Array unset.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
? >

At first glance it may seem that the memory consumption of the $ a array will not change (except for setting the variables $ k and $ v), however PHP has a special approach when working with arrays in this case.

Look at the output:

Array memory usage example.Base usage.
Bytes diff: 0
Array is set.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 77632
In FOREACH cycle.
Bytes diff: 93032
In FOREACH cycle.
Bytes diff: 108432
In FOREACH cycle.
Bytes diff: 123832
Usage right after FOREACH.
Bytes diff: 61940
Array unset.
Bytes diff: 0

It turns out that in the last iteration of the foreach loop in this case, the consumption by the memory array doubled, although this is not obvious from the code itself. But immediately after the cycle, the memory consumption returned to its previous value. Miracles and only.
The reason for this is to optimize the use of an array in a loop. At the time of the cycle, when you try to change the original array, an implicitly created copy of the array structure (but not a copy of the values) becomes available at the end of the cycle, and the original structure is destroyed. Thus, in the above example, if you assign new values to the original array, they will not be replaced immediately, but they will be allocated a separate memory, which will be returned upon exiting the loop.
This point is very easy to miss, which can lead to significant memory consumption during the cycle with large data sets, for example, when sampling from the database.

Note:
Inside the loop itself, after changing the value of $ a [$ k], you will not be able to get the value that is still stored in the original array if you have not saved the value of $ v. Repeated reference to $ a [$ k] will produce a new value.

Addition from the user zibada (in short):
It is important to note that the allocation of memory for a new “temporary array” in the event of changes, will occur simultaneously for the entire array structure, but separately for each variable element. Thus, if there is an array with a large number of elements (but not necessarily with large values), then one-time memory consumption with such copying will be significant.

Insidious example number 2
Slightly change the code.

echo 'Array is set.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
$ b = & $ a ; // Add it
foreach ( $ a as $ k => $ v ) {
$ a [ $ k ] = someBigValue ( ) ;
unset ( $ k, $ v ) ;
echo 'In FOREACH cycle.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
}
unset ( $ b ) ; // And this
echo 'Usage right after FOREACH.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

We didn’t change the loop code itself; the only thing we changed was increased the reference count to the original array, but this radically changed the work of the loop:

Bytes diff: 0
Array is set.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 61988
In FOREACH cycle.
Bytes diff: 61988
In FOREACH cycle.
Bytes diff: 61988
In FOREACH cycle.
Bytes diff: 61988
Usage right after FOREACH.
Bytes diff: 61940
Array unset.
Bytes diff: 0

Small change: (61988 - 61940 = 48 bytes for the storage of the reference variable $ b).
Otherwise, we see that if the array used for the loop has more than one reference to itself, then the optimization from Example 1, for example, does not apply to it. the assignment uses the original array.
We will get exactly the same result if we use the $ b array for the loop or use the value passing by reference in the loop:

echo 'Array is set.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

foreach ( $ a as $ k => & $ v ) {
$ a [ $ k ] = someBigValue ( ) ; // Or $ v = someBigValue ();
unset ( $ k, $ v ) ;
echo 'In FOREACH cycle.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
}

echo 'Usage right after FOREACH.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

Result:

Bytes diff: 0
Array is set.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 61940
In FOREACH cycle.
Bytes diff: 61940
Usage right after FOREACH.
Bytes diff: 61940
Array unset.
Bytes diff: 0

It is worth noting here that adding the $ v transfer by reference does not increase the reference count of the original array, but also turns off the "optimization".

Link transfer or copy transfer

Consider the case of "what to do" if you want to transfer (or return from them) to a method or function, a very large value. The first obvious solution is usually to consider using a pass / return link.
However, the PHP documentation says: Do not use return by reference for increased performance. The PHP core itself is engaged in optimization .
Let us try to understand what this “optimization” is.

To begin with the simplest example (so far without passing arguments):

...
$ a = someBigValue ( ) ;
$ b = $ a ;

echo "String value setted" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a, $ b ) ;
...

According to "direct logic", two blocks should be allocated in memory for the value of variables. However, PHP optimizes this point:

Start
Bytes diff: 0
String value setted
Bytes diff: 15496
String value unsetted
Bytes diff: 0

In this case, the 15,448 bytes are occupied by the $ a variable, while the remaining 48 bytes are allocated to the $ b variable, although there is no link between them. This memory consumption is saved until we somehow change one of these variables, or rather, we’ll generally do something with its value, even if we don’t change it in fact:

$ a = someBigValue ( ) ;
$ b = $ a ;
$ b = strval ( $ b ) ;

echo "String value setted" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a, $ b ) ;

As a result, we get the output:

Bytes diff: 0
String value setted
Bytes diff: 30896
String value unsetted
Bytes diff: 0

As we see, an attempt to “touch” the value of the $ b variable leads to the fact that the script now allocates a separate memory area for its storage. The same thing happens if we try to “touch” the value of $ a.

This optimization is valid for specific values, which are also individual values of the array.
To better understand this, take a look at the example below:

$ a = array ( someBigValue ( ) , someBigValue ( ) ) ; // 31052 bytes
$ b = $ a ; // + 48 bytes = 31100 bytes
$ b [ 0 ] = someBigValue ( ) ;

echo "String value setted" ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a, $ b ) ;

This example will give the output:

Bytes diff: 0
String value setted
Bytes diff: 46704
String value unsetted
Bytes diff: 0

That is, as a result, a new memory (15k + bytes) was allocated to create only a copy of the value for the zero element of the array, and not for the entire array $ b. The value of $ b [1] is still “optimized due” to $ a [1].

Everything described above works in the same way for transferring / returning values through “optimized copying” inside / of functions and methods. If you don’t "touch" the transferred value inside the method, then a separate memory area will not be allocated for it (memory will be allocated only under the variable name in order to associate it with the value). If you pass “through copying” and change the value inside the method, then before attempting to make the change a valid full copy of the value will be created.

Thus, PHP really eliminates the need to use referral to optimize memory usage. Passing by reference is only practical if the original value needs to be changed to reflect these changes from outside the method.

Code for example:

< ? php
include ( 'func.php' ) ;

function testUsageInside ( $ big_value, $ base_memory_usage ) {
echo 'Usage inside function then $ big_value NOT changed.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

$ big_value [ 0 ] = someBigValue ( ) ;
echo 'Usage inside function then $ big_value [0] changed.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

$ big_value [ 1 ] = someBigValue ( ) ;
echo 'Usage inside function then also $ big_value [1] changed.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

}

echo "Array memory usage example." ;
$ base_memory_usage = memory_get_usage ( ) ;
$ base_memory_usage = memory_get_usage ( ) ;

echo 'Base usage.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

$ a = array ( someBigValue ( ) , someBigValue ( ) , someBigValue ( ) , someBigValue ( ) ) ;

echo 'Array is set.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

testUsageInside ( $ a, $ base_memory_usage ) ;

echo 'Usage right after function call.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;

unset ( $ a ) ;
echo 'Array unset.' . PHP_EOL ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
? >

Conclusion:

Array memory usage example.
Base usage.
Bytes diff: 0
Array is set.
Bytes diff: 61940
Usage inside function then $ big_value NOT changed.
Bytes diff: 61940
Usage inside function then $ big_value [0] changed.
Bytes diff: 77632
Usage inside function then also $ big_value [1] changed.
Bytes diff: 93032
Usage right after function call.
Bytes diff: 61940
Array unset.
Bytes diff: 0

As you can see from the example, the function did not create a copy of the array, despite the fact that the value is actually transmitted through copying. And even a partial modification of the transferred array did not create a full-fledged copy, but allocated memory only for new values.

Solely for educational purposes, you should pay attention to these two meanings:

Array is set.
Bytes diff: 61940
Usage inside function then $ big_value NOT changed.
Bytes diff: 61940

Memory consumption did not increase with the transfer of control to the function, although in fact a new variable $ big_value appeared. This is due to the fact that at the script parsing stage, the interpreter determined whether this function would be used in the code and allocated space for the names of its input parameters in advance (if the function is not used, the interpreter ignores it and does not allocate memory for it). And since “optimized transfer by copying” takes place, the already existing variable name $ big_value was simply implicitly “linked” to the large array $ a. As a result, the value was transferred to the function “through copying” without spending a single extra byte.

Note:
In PHP5 (as opposed to PHP4), all objects are by default passed by reference, although in fact, this is a defective link. See this article .

Brief conclusions

There are no doubt that the examples of optimizing memory usage in PHP are only a drop in the ocean, but they describe the most frequent cases when it makes sense to think about which code to choose in order to optimize memory consumption and save yourself from the headache.

Separately, it would be worthwhile to touch on the mechanism of spending and optimizing memory when using objects, however, in view of the abundance of possible examples, this point requires a separate article. Maybe someday.

PS: It would be possible to break it up into several articles, but I do not see the point, since it is better to keep such information “together”. I believe those to whom this information has practical meaning will be more convenient. It was tested for PHP 5.3.2 (Ubuntu 32bit), so your values for the allocated bytes may differ.

A lot more useful, but in English:
nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
nikic.github.com/2011/11/11/PHP-Internals-When-does-foreach-copy.html
blog.golemon.com/2007/01/youre-being-lied-to.html
hengrui-li.blogspot.com/2011/08/php-copy-on-write-how-php-manages.html
sldn.softlayer.com/blog/dmcaloon/PHP-Memory-Management-Foreach
blog.preinheimer.com/index.php?/archives/354-Memory-usage-in-PHP.html
derickrethans.nl/talks/phparch-php-variables-article.pdf

UPD
The main part of the article did not cover an important point.
If there is a variable to which the link is created, then when it is passed to the function as an argument, it will be copied immediately, that is, the copy-on-write optimization will not be applied.
Example:

< ? php
include ( 'func.php' ) ;
function testFunc ( $ a, $ base_memory_usage ) {
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ;
}
$ base_memory_usage = 0 ;
$ base_memory_usage = memory_get_usage ( ) ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ; // 0 bytes
$ a = someBigValue ( ) ;
$ b = & $ a ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ; // 15496 bytes
testFunc ( $ a, $ base_memory_usage ) ; // 30896 bytes
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ; // 15496 bytes
unset ( $ a, $ b ) ;
memoryUsage ( memory_get_usage ( ) , $ base_memory_usage ) ; // 0 bytes
? >

Source: https://habr.com/ru/post/134784/

All Articles