array - complex, flexible, hybrid, combines the behavior of list and linked map . But we use it for everything, because PHP adheres to a pragmatic approach : to have an extremely correct, sensible and realistic way of solving a problem, coming from practical, not theoretical, reasoning. array allows you to do work, although there are so many about it in lectures on computer science. But, unfortunately, complexity comes with flexibility.array been reworked . But arrays still adhere to the principle “optimized for everything; optimized for nothing, “not everything is perfect, there is room for improvement.What about SPL data structures?Unfortunately ... they are terrible. Previously, to PHP7, they offered some advantages, but now we have reached the point when using SPL has no practical meaning.
Why can't we just fix and improve them?Yes, we could, but I believe that their design and implementation are so poor that it would be better to find a more modern replacement.
"SPL data structures are horribly designed."
- Anthony Ferrara
php-ds is an extension for PHP7 that adds data structures. This post briefly covers the behavior, performance, and benefits of each. Also at the end you will find a list of answers to the expected questions.Ds\Collection , Sequence , HashableVector , Deque , Stack , Queue , PriorityQueue , Map , SetCollection is a basic interface that covers the common functionality: foreach , echo , count , print_r , var_dump , serialize , json_encode , and clone .Sequence describes the behavior of elements organized into a single, linear dimension. In some languages, such a structure is called a List . Like an array that uses incremental keys, with the exception of some features:[0, 1, 2, …, size - 1][0, size - 1]array as a list (without keys)SplDoublyLinkedList and SplFixedArrayVector is a Sequence that combines values into a continuous buffer, increasing and decreasing automatically. This is the most efficient sequential data structure, since the element index is a direct reflection of its index in the buffer, and an increase in the vector will not affect the performance in any way.get , set , push and pop have O(1) complexityinsert , remove , shift and unshift have complexity O(n)The number one structure in Photoshop was Vectors.
- Sean Parent , CppCon 2015
Deque (pronounced “deck”) is a sequence of values combined into a continuous buffer that increases and decreases automatically. The name is a common abbreviation for double-ended queue . Used inside Ds\Queue .shift and unshift so fast that even Vector cannot compete with that.((head + position) % capacity) .get , set , push , pop , shift and unshift have complexity O(1)insert , remove have complexity O(n)2ⁿ )push 2 push random numbers operation. array , Ds\Vector and Ds\Deque work quickly, but SplDoublyLinkedList consistently shows a result more than 2 times worse .SplDoublyLinkedList allocates memory for each value separately, which is why an expected increase in memory occurs. array and Ds\Deque in their implementation, allocate memory in portions to maintain sufficient volume for 2ⁿ elements. Ds\Vector has a growth factor of 1.5, which entails an increase in the number of memory allocations, but less consumption in general.

unshift single element in a sequence of 2 unshift values. The time required to set the values is not taken into account.array_unshift has O(n) complexity: whenever the sample size doubles, the time required for unshift . This is due to the fact that each numerical index in the range [1, size - 1] must be updated.
Ds\Vector::unshift also O(n) , so why is it much faster? Note that array stores each value in a bucket along with its key and hash. Therefore, you have to check each item and update the hash if the index is numeric. In fact, array_unshift allocates a new array for this and replaces the old one when all values are copied.memmove operation.Ds\Deque and SplDoublyLinkedList in turn, are very fast, because the sample size does not affect the time for unshift , i.e. its complexity will be O(1) .pop operations. In other words when resizing from 2ⁿ to zeroarray always holds the allocated memory, even if its size is significantly reduced. Ds\Vector and Ds\Deque allow you to reduce the allocated resources by half if their size falls below a quarter of their current potential. SplDoublyLinkedList frees the memory after each deletion from the sample, so we can observe a linear decrease.
Ds\Stack uses the inside of Ds\Vector .SplStack inherited from SplDoublyLinkedList, so performance will be equivalent to comparing Ds\Vector to SplDoublyLinkedList from previous tests. Let's look at the time it takes to perform 2ⁿ pop operations, resizing from 2ⁿ to zero.
Ds\Queue uses inside of itself Ds\Deque . SplQueue inherited from SplDoublyLinkedList , so performance will be equivalent to comparing Ds\Deque with SplDoublyLinkedList shown in the previous benchmark.pop operations, which is a very costly operation.push 2ⁿ random numbers with a random priority in the queue. The same random numbers will be used for each of the tests. A random priority is also generated for the Queue , although it is not used.Ds\PriorityQueue works more than twice as fast as SplPriorityQueue and uses only 5% of its memory - this is 20 times more efficient memory solution .SplPriorityQueue when the SplPriorityQueue uses a similar internal structure? It all comes down to how values are paired with priority. SplPriorityQueue allows you to use any type of value to use as a variable, this leads to the fact that in each pair the priority takes 32 bytes .Ds\PriorityQueue only supports integer priorities, so 24 bytes are allocated to each pair. But this is still not enough difference to explain the result.SplPriorityQueue::insert , you will notice that it initializes the array to store the priority value pair .zval + HashTable + 8 * (Bucket + hash) + 2 * zend_string + (8 + 16) byte string payloads is actually allocated for each pair zval + HashTable + 8 * (Bucket + hash) + 2 * zend_string + (8 + 16) byte string payloads = 16 + 56 + 36 * 8 + 2 * 24 + 8 + 16 = 432 bytes (64 bits).SplPriorityQueue uses the same internal SplMaxHeap structure, which requires the value to be a zval type. The obvious (but inefficient) way to create a zval pair is that zval itself is used as an array .

spl_object_hash , which determines an object into a hash based on its handle: This means that two objects that would be considered equal when comparing would not have an equal hash, since they are not the same instance.Hashable introduces only two methods: hash and equals . Many other languages support this initially: in Java, hashCode and equals , or in Python, ___hash___ and __eq__ . There were several RFCs adding this behavior to PHP, but none of them were accepted.spl_object_hash if the keys of the objects stored in them do not implement Hashable in themselves.Hashable interface: Map and Set .Map is a sequential collection of key-value pairs, almost identical to array in a similar context. Keys can be of any type , the only condition is uniqueness. When you add the key again, the values are replaced.array , the insertion order is preserved.arrayHashable interfaceput , get , remove and containsKey have complexity O(1)array if there are object keys.array and Ds\Map identical. However, the array will always keep the allocated memory when Ds\Map , in turn, will free up memory if the size falls below a quarter of its potential.

Set is a collection of unique values . The tutorials will tell you that in the Set structure, the values are unordered, unless the implementation provides otherwise. Take for example Java, java.util.Set is an interface with two main implementations: HashSet and TreeSet . HashSet provides O(1) complexity for add and remove , a TreeSet provides a sorted dataset, but add and remove complexity increases to O(log n) .Set uses the same internal structure as Map , also based on array . This means that Set can be sorted in O(n * log(n)) when needed, otherwise it is as simple as Map and array .add , remove and contains have complexity O(1)Hashable interfaceSplObjectStorage only supports objects).intersection , difference , union , exclusive or )push , pop , insert , shift or unshiftget has complexity O(n) if there are remote values before indexing, otherwise - O(1)stdClass . It shows that Ds\Set slightly faster than SplObjectStorage , and uses about half as much memory.

array_unique , which creates a new array containing only unique values. But it is important to keep in mind that the values in the array are not indexed , in_array is a linear search with complexity O(n) . array_unique works only with values, excluding keys, each checking for the presence of an array value is a linear search, which gives us a total of O(n²) complexity in time and O(n) memory consumption.
PHP 7.0.3 build for 2015 Macbook Pro . Results may vary by version and platform.Stack , Queue , Set and Map not interfaces?Deque instead of Vector ?shift and unshift , use Vector . For convenient typehipping, you can specify as a type of Sequence .php-ds API design applies the “Composition over inheritance .SplStack extends SplDoublyLinkedList , which supports random access by index, shift and unshift - so technically it is not a stack .ArrayDeque has three methods for adding elements: add , addLast and push . This is not bad, because ArrayDeque implements Deque and Queue , which explains the simultaneous presence of addLast and push . However, all three methods at once, doing the same thing, cause confusion and inconsistency.java.util.Stack extended java.util.Vector , thereby stating that “a more complete and consistent set of LIFO operations is provided by the Deque interface and its implementations,” but Deque includes the addFirst and remove(x) methods that should not be part of the stack structure by API.array . You cannot inherit from an array , it forces you to develop your own API around you, using it only to store actual data.ds class in the global namespace?
LinkedList class actually appeared first, it seemed like a good start. But in the end I decided to remove it when I realized that it could not compete with Vector or Deque in any case. The two main reasons for possible support are: allocation of overhead and locality of links .Vector and Deque , allocate a buffer of memory in advance, so there is no need to do it so often. They also do not need additional pointers to know what the value is before and what after, thereby reducing overhead costs.Vector is (1.5 * (size - 1)) * zval bytes, at least * 10 * zval *. In a doubly linked list, it will be used (size * (zval + 8 + 8)) . Therefore, a coherent list will use less memory than Vector only when its size is less than 6 elements.Vector and Deque significant advantage: the elements are physically next to each other.»Incomplete data in structures is the root of all the evils of performance. Specifically, please say no to linked lists. "
"There is almost nothing more harmful than what you can do to kill all the advantages of modern microprocessors than using a coherent list"
- Chandler Carruth ( CppCon 2014 )
Source: https://habr.com/ru/post/280262/
All Articles