⬆️ ⬇️

How big are arrays (and values) in PHP? (Hint: VERY BIG)

In this article, I want to examine the memory consumption of arrays (and values ​​in general) in PHP using the following script as an example, which creates 100,000 unique integer array elements and at the end measures the amount of memory used.



This is a translation (for people like me who often do not notice this).



At the beginning, I want to thank Johannes and Tyrael for their help in finding hidden places to waste memory.



<?php $startMemory = memory_get_usage(); $array = range(1, 100000); echo memory_get_usage() - $startMemory, ' bytes'; 


')

How much do you think it will be? If the integer is 8 bytes (on 64 architectures and using the long type) and there are 100,000 integers, then obviously 800,000 bytes are required. This is about 0.76 MB.



Now try running the code. This can be done on-line. The result is 14,649,024 bytes. Yes, you heard right, it is 13.97 MB - 18 times more than we estimated.



So where does this 18-fold increase come from?



Summary





For those who do not want to deal with all of this, here is a brief overview of the components involved.



  | 64 bit | 32 bit --------------------------------------------------- zval | 24 bytes | 16 bytes + cyclic GC info | 8 bytes | 4 bytes + allocation header | 16 bytes | 8 bytes =================================================== zval (value) total | 48 bytes | 28 bytes =================================================== bucket | 72 bytes | 36 bytes + allocation header | 16 bytes | 8 bytes + pointer | 8 bytes | 4 bytes =================================================== bucket (array element) total | 96 bytes | 48 bytes =================================================== total total | 144 bytes | 76 bytes 




The numbers above may vary depending on your operating system, compiler and compilation options. For example, if you compile PHP with debug or thread-safety, you will get different values. But I think that you will see the given sizes on an ordinary PHP 5.3 build on 64-bit Linux.



If you multiply these 144 bytes by our 100,000 numbers, you get 14,400,000 bytes, which is 13.73 MB. Pretty close to the actual result, the rest is mostly pointers for uninitialized blocks (buckets), but I will tell about this later.



Now, if you want to have a more detailed analysis of the values ​​that are listed above, then read on :).



Union zvalue_value





First, let's take a look at how PHP stores values. As you know, PHP is a weakly typed language, so it needs a way to quickly switch between values. PHP uses a union, which is defined as follows: zend.h # 307 (my comments):



 typedef union _zvalue_value { long lval; //     double dval; //      struct { //   char *val; //      int len; //    } str; HashTable *ht; //   (-) zend_object_value obj; //   } zvalue_value; 




If you do not know C, then this is not a problem - the code is very simple: combining means that a value can act in the role of various types. For example, if you use zvalue_value-> lval , then the value will be interpreted as an integer. On the other hand, if you use zvalue_value-> ht , then the value will be interpreted as a pointer to a hash table (aka array).



We will not linger on this. What is important for us is that the size of the union is equal to the size of its largest component. The largest component is a string (in fact, the structure of zend_object_value is also the size, but I will omit this moment for simplicity). The structure consists of a pointer (8 bytes) and an integer (4 bytes). Total 12 bytes. Thanks to memory alignment (12-byte structures are not cool, because they are not 64-bit / 8-byte products), the final structure size will be 16 bytes and, accordingly, the entire union as a whole.



So, now we know that we need not 8 bytes for each value, but 16 - due to the dynamic typing of PHP. Multiplying by 100,000 yields 1,600,000 bytes, i.e. 1.53 MB. But the real volume is 13.97 MB, so we have not reached the goal yet.



Zval structure





It is quite logical that the union stores only the value, and PHP obviously needs to store its type and some information for garbage collection. The structure that contains this information is called zval and you have probably already heard about it. For more information on why this is PHP, I recommend reading the Sara Golemon article . However, this structure is defined as follows :



 struct _zval_struct { zvalue_value value; //  zend_uint refcount__gc; //     ( GC) zend_uchar type; //  zend_uchar is_ref__gc; //      (&) }; 




The size of the structure is determined by the sum of the sizes of all its components: zvalue_value - 16 bytes (calculation above), zend_uint - 4 bytes, zend_uchar - 1 byte each. A total of 22 bytes. Again, due to memory alignment, the actual size will be 24 bytes.



So, if we store 100,000 values ​​of 24 bytes each, it will be 2,400,000 bytes or 2.29 MB. The gap is shrinking, but the real value is still more than six times larger.



Circular link garbage collector (PHP 5.3)





PHP 5.3 introduced a new garbage collection for circular references . For this, PHP stores some additional information. I do not want to explain here how it works, you can get the necessary information from the manual. For our sizing calculations, it is important that each zval turns into zval_gc_info :



 typedef struct _zval_gc_info { zval z; union { gc_root_buffer *buffered; struct _zval_gc_info *next; } u; } zval_gc_info; 




As you see, Zend only adds a union that contains two pointers. As you remember the size of the union is determined by the largest component. Both components are 8 byte pointers. Accordingly, the size of the union is also 8 bytes.



If we add the 24 bytes received above, we will get 32 ​​bytes. Multiply this by 100,000 and get 3.05 MB.



ZEND Memory Manager





C, unlike PHP, does not manage memory for you. You must independently monitor the allocation of memory. For this, PHP uses its own memory manager optimized for its needs: The Zend Memory Manager . MM Zend is based on mallocs from Doug Lea and all additional PHP-specific features and optimizations (such as memory constraints, cleanup after each request, and the like).



What is important to us in this is that MM adds a header for each memory allocation that passes through it. And is defined as follows :



 typedef struct _zend_mm_block { zend_mm_block_info info; #if ZEND_DEBUG unsigned int magic; # ifdef ZTS THREAD_T thread_id; # endif zend_mm_debug_info debug; #elif ZEND_MM_HEAP_PROTECTION zend_mm_debug_info debug; #endif } zend_mm_block; typedef struct _zend_mm_block_info { #if ZEND_MM_COOKIES size_t _cookie; #endif size_t _size; //   size_t _prev; //   (   ) } zend_mm_block_info; 




As you can see the definition includes many checks on compilation options. If at least one of these options is enabled, the header for the allocated memory will be larger, and it will be the biggest if you compile PHP with heap protection, thread safety, debugging and MM cookies.



For example, we will assume that all these options are disabled. In this case, there are only two components size_t _size and _prev . The size_t is 8 bytes (64 bits), so the header is 16 bytes in size - and this header is added for each memory allocation.



So we need to adjust the size of the zval again. In fact, it will not be 32 bytes, but 48, because of this header. Multiply by our 100,000 items and we get 4.58 MB. The real size is 13.97 MB, so we already covered about a third.



Blocks





So far, we have considered the values ​​separately. But the array structure in PHP takes a lot of space. In fact, the term "array" is chosen poorly. In PHP, an array is actually a hash of a table / dictionary. So how does hash tables work? Basically, a hash is generated for each key, and this hash is used to go to the “real” C array. Hashes can conflict, all items that have the same hash are stored in a linked list. When accessing an element, PHP first calculates the hash, searches for the necessary block (bucket), and traverses the list looking for an exact match element by element. The block is defined as follows ( zend_hash.h # 54 ):



 typedef struct bucket { ulong h; //  (    ) uint nKeyLength; //   (  ) void *pData; //  void *pDataPtr; // ???   ??? struct bucket *pListNext; // PHP  .     struct bucket *pListLast; //    struct bucket *pNext; //     ()   struct bucket *pLast; //     ()   const char *arKey; //  (  ) } Bucket; 




As you can see, it is necessary to store a “load” of data in order to get an abstract array of data like the one used in PHP (PHP arrays are arrays, dictionaries and linked lists at the same time, which, of course, requires a lot of data). The size of the individual components is: 8 bytes for type ulong , 4 bytes for uint, and 7 times 8 bytes for pointers. The result is 68. Add alignment and get 72 bytes.



For blocks as for zval, 16 byte headers should be added, which gives us 88 bytes. We also need to store pointers to these blocks in a “real” C array (Bucket ** arBuckets;), I mentioned this above, which adds another 8 bytes per element. So in general, each block consumes 96 bytes of memory.



And so, if we need a block for each value, it will be 96 bytes for the bucket and 48 bytes for the zval , which is 144 bytes in total. For 100,000 items, this will be 14,400,000 bytes, or 13.73 MB.



Riddle solved.



Wait, there is still 0.24 MB!





These last 0.24 MB are due to uninitialized blocks: the size of the “real” array C should ideally be equal to the number of elements. This way we get the least amount of collisions (unless you want to spend a lot of memory). But PHP obviously cannot redistribute the entire array every time a new element is added - that would be sooo slow. Instead, PHP always doubles the size of the internal block array if it falls within the limit. Thus, the size of the array is always a power of two.



In our case, this is 2 ^ 17 = 131,072. But we only need 100,000 of these blocks, so we leave 31,072 blocks unused. Those memory for these blocks will not be allocated (so we do not need to spend the full 96 bytes), but the memory under the pointer (which is stored in the internal array of blocks) on the block should be used. Therefore, we additionally use 8 bytes (per pointer) * 31,072 elements. This is 248,576 bytes or 0.23 MB. What corresponds to the missing memory. (Of course, there are a few more bytes missing, but I don’t want to completely cover everything. These are such things as the hash table structure itself, variables, etc.)



The riddle is really solved.



What does this tell us?





PHP is not C. And that only tells us that. You can’t expect effective memory usage like in C from the super dynamic PHP language. That's all you can do.



But, if you want to save memory, you can consider using SplFixedArray for large static arrays.



Let's look at the modified script:



 <?php $startMemory = memory_get_usage(); $array = new SplFixedArray(100000); for ($i = 0; $i < 100000; ++$i) { $array[$i] = $i; } echo memory_get_usage() - $startMemory, ' bytes'; 




It basically does the same thing, but if you run it, you will notice that it uses “only” 5,600,640 bytes. What is 56 bytes per element, and it is much less than 144 bytes per element of a regular array. This is because a fixed array does not need a bucket structure: so only one zval (48 bytes) and one pointer (8 bytes) are required for each element, which will give us the observed 56 bytes.



PS I ask you to write all comments on the translation in the LAN, and I will try to correct them promptly.

Source: https://habr.com/ru/post/141093/



All Articles