Comparing the performance of brute force arrays in a loop using for () and foreach ()

I would like to draw attention to one not obvious feature of php.
Suppose we have an array with integer indices

$arr = array( $val1, $val2, ..., $valn );

This array can be looped through in two ways.

 foreach($arr as $k => $v ) {...}

and

 $n = count( $arr ); for($k = 0; $k < $n; $k++ ) {...}

It seems quite obvious that the second method should be, if not faster, then certainly not more slowly.
Let's see.
- Not. No benchmarks. Only code!

In fact, the second method is slower! This seems especially not obvious to those who have experience in C and C++ , where the array index is a simple pointer offset.
I discovered this feature quite a long time ago, but they didn’t reach out to see what it’s all about.

So why is this happening

Let's look at the internal array device in php.
Internally, at the Zend core, the array is represented by several interrelated structures:
')
The Bucket structure describes a “pointer” to the current position of the array.

 typedef struct bucket { ulong h; //    // (  ) uint nKeyLength; //   ( ) void *pData; //    // [   : (zval **)pos->pData ] void *pDataPtr; struct bucket *pListNext; //      struct bucket *pListLast; //      struct bucket *pNext; //      arBuckets struct bucket *pLast; //      arBuckets const char *arKey; //   ,    } Bucket; typedef Bucket* HashPosition;

The HashTable structure is the array itself in the internal representation:

 typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; //     ulong nNextFreeElement; Bucket *pInternalPointer; Bucket *pListHead; //      Bucket *pListTail; //      Bucket **arBuckets; // ,  . //      //     dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection; } HashTable;

Even the information already provided is enough to understand the reason. Arrays are implemented as doubly linked lists.
The doubly linked lists allow quick insertion, a fast enough search, but random access to them is a slow operation.

Let's see how, after all, Zend iterates the elements of an array

 //     . int zend_hash_move_forward_ex(HashTable *ht, HashPosition *pos) { HashPosition *current = pos ? pos : &ht->pInternalPointer; if (*current) { *current = (*current)->pListNext; return SUCCESS; } else return FAILURE; } //     . //  php   ,    Zend //       . int zend_hash_move_backwards_ex(HashTable *ht, HashPosition *pos) { HashPosition *current = pos ? pos : &ht->pInternalPointer; if (*current) { *current = (*current)->pListLast; return SUCCESS; } else return FAILURE; }

Here, I think, everything is clear without additional explanations - the pointer to the current element is replaced by a pointer to the next element, the link to which is stored in the current one.

Now let's see how to access the index .

 int zend_hash_index_find(const HashTable *ht, ulong h, void **pData) { uint nIndex; Bucket *p; nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while (p != NULL) { if ((p->h == h) && (p->nKeyLength == 0)) { *pData = p->pData; return SUCCESS; } p = p->pNext; } return FAILURE; }

Here we need some explanations.
I would not like to bore readers with an excessively detailed description of how the arBuckets array is arBuckets ( C array from HashPosition - pointers to Bucket ).
I can only say that here we sequentially iterate through the part of the internal hash table of arBuckets before searching for the desired value.

findings

 foreach($arr as $i => $v ){...}

Quite quickly enumerates all the elements of the array, getting them one by one. The complexity of this operation is O (n).

 for($i = 0; $i < $n; $i++ ){...}

At each iteration, it searches for an index in the internal hash table.

It is estimated that the complexity of the last operation would be expressed by the sum of the arithmetic progression n * (n + 1) / 2, i.e. and would have the order of O (n ² ), if the entire hash table were searched for by the index. In fact, not all values are sorted, but only a part of them, so the complexity will vary from O (n) to O (n ² ). And for large arrays, it will be closer to, O (n). But even in this case, iterating over an array with access by key is a more resource-intensive operation.

As for arrays with string keys (or mixed - integer and string).
All of the above is also true for them with the difference that the speed of access to a string key is on average 8 (maximum 16) times greater than the integer

Source: https://habr.com/ru/post/216103/

All Articles

Comparing the performance of brute force arrays in a loop using for () and foreach ()

So why is this happening

findings

More articles: