array
- complex, flexible, hybrid, combines the behavior of list
and linked map
. But we use it for everything, because PHP adheres to a pragmatic approach : to have an extremely correct, sensible and realistic way of solving a problem, coming from practical, not theoretical, reasoning. array
allows you to do work, although there are so many about it in lectures on computer science. But, unfortunately, complexity comes with flexibility.array
been reworked . But arrays still adhere to the principle “optimized for everything; optimized for nothing, “not everything is perfect, there is room for improvement.What about SPL data structures?Unfortunately ... they are terrible. Previously, to PHP7, they offered some advantages, but now we have reached the point when using SPL has no practical meaning.
Why can't we just fix and improve them?Yes, we could, but I believe that their design and implementation are so poor that it would be better to find a more modern replacement.
"SPL data structures are horribly designed."
- Anthony Ferrara
php-ds
is an extension for PHP7 that adds data structures. This post briefly covers the behavior, performance, and benefits of each. Also at the end you will find a list of answers to the expected questions.Ds\
Collection
, Sequence
, Hashable
Vector
, Deque
, Stack
, Queue
, PriorityQueue
, Map
, Set
Collection
is a basic interface that covers the common functionality: foreach
, echo
, count
, print_r
, var_dump
, serialize
, json_encode
, and clone
.Sequence
describes the behavior of elements organized into a single, linear dimension. In some languages, such a structure is called a List
. Like an array
that uses incremental keys, with the exception of some features:[0, 1, 2, …, size - 1]
[0, size - 1]
array
as a list (without keys)SplDoublyLinkedList
and SplFixedArray
Vector
is a Sequence
that combines values into a continuous buffer, increasing and decreasing automatically. This is the most efficient sequential data structure, since the element index is a direct reflection of its index in the buffer, and an increase in the vector will not affect the performance in any way.get
, set
, push
and pop
have O(1)
complexityinsert
, remove
, shift
and unshift
have complexity O(n)
The number one structure in Photoshop was Vectors.
- Sean Parent , CppCon 2015
Deque
(pronounced “deck”) is a sequence of values combined into a continuous buffer that increases and decreases automatically. The name is a common abbreviation for double-ended queue . Used inside Ds\Queue
.shift
and unshift
so fast that even Vector
cannot compete with that.((head + position) % capacity)
.get
, set
, push
, pop
, shift
and unshift
have complexity O(1)
insert
, remove
have complexity O(n)
2ⁿ
)push
2 push
random numbers operation. array
, Ds\Vector
and Ds\Deque
work quickly, but SplDoublyLinkedList
consistently shows a result more than 2 times worse .SplDoublyLinkedList
allocates memory for each value separately, which is why an expected increase in memory occurs. array
and Ds\Deque
in their implementation, allocate memory in portions to maintain sufficient volume for 2ⁿ elements. Ds\Vector
has a growth factor of 1.5, which entails an increase in the number of memory allocations, but less consumption in general.unshift
single element in a sequence of 2 unshift
values. The time required to set the values is not taken into account.array_unshift
has O(n)
complexity: whenever the sample size doubles, the time required for unshift
. This is due to the fact that each numerical index in the range [1, size - 1]
must be updated.Ds\Vector::unshift
also O(n)
, so why is it much faster? Note that array
stores each value in a bucket
along with its key and hash. Therefore, you have to check each item and update the hash if the index is numeric. In fact, array_unshift
allocates a new array for this and replaces the old one when all values are copied.memmove
operation.Ds\Deque
and SplDoublyLinkedList
in turn, are very fast, because the sample size does not affect the time for unshift
, i.e. its complexity will be O(1)
.pop
operations. In other words when resizing from 2ⁿ to zeroarray
always holds the allocated memory, even if its size is significantly reduced. Ds\Vector
and Ds\Deque
allow you to reduce the allocated resources by half if their size falls below a quarter of their current potential. SplDoublyLinkedList
frees the memory after each deletion from the sample, so we can observe a linear decrease.Ds\Stack
uses the inside of Ds\Vector
.SplStack
inherited from SplDoublyLinkedList, so performance will be equivalent to comparing Ds\Vector
to SplDoublyLinkedList
from previous tests. Let's look at the time it takes to perform 2ⁿ pop
operations, resizing from 2ⁿ to zero.Ds\Queue
uses inside of itself Ds\Deque
. SplQueue
inherited from SplDoublyLinkedList
, so performance will be equivalent to comparing Ds\Deque
with SplDoublyLinkedList
shown in the previous benchmark.pop
operations, which is a very costly operation.push
2ⁿ random numbers with a random priority in the queue. The same random numbers will be used for each of the tests. A random priority is also generated for the Queue
, although it is not used.Ds\PriorityQueue
works more than twice as fast as SplPriorityQueue
and uses only 5% of its memory - this is 20 times more efficient memory solution .SplPriorityQueue
when the SplPriorityQueue
uses a similar internal structure? It all comes down to how values are paired with priority. SplPriorityQueue
allows you to use any type of value to use as a variable, this leads to the fact that in each pair the priority takes 32 bytes .Ds\PriorityQueue
only supports integer priorities, so 24 bytes are allocated to each pair. But this is still not enough difference to explain the result.SplPriorityQueue::insert
, you will notice that it initializes the array to store the priority value pair .zval + HashTable + 8 * (Bucket + hash) + 2 * zend_string + (8 + 16) byte string payloads
is actually allocated for each pair zval + HashTable + 8 * (Bucket + hash) + 2 * zend_string + (8 + 16) byte string payloads
= 16 + 56 + 36 * 8 + 2 * 24 + 8 + 16
= 432 bytes (64 bits).SplPriorityQueue
uses the same internal SplMaxHeap
structure, which requires the value to be a zval
type. The obvious (but inefficient) way to create a zval
pair is that zval
itself is used as an array
.spl_object_hash
, which determines an object into a hash based on its handle:
This means that two objects that would be considered equal when comparing would not have an equal hash, since they are not the same instance.Hashable
introduces only two methods: hash
and equals
. Many other languages support this initially: in Java, hashCode
and equals
, or in Python, ___hash___
and __eq__
. There were several RFCs adding this behavior to PHP, but none of them were accepted.spl_object_hash
if the keys of the objects stored in them do not implement Hashable
in themselves.Hashable
interface: Map
and Set
.Map
is a sequential collection of key-value pairs, almost identical to array
in a similar context. Keys can be of any type , the only condition is uniqueness. When you add the key again, the values are replaced.array
, the insertion order is preserved.array
Hashable
interfaceput
, get
, remove
and containsKey
have complexity O(1)
array
if there are object keys.array
and Ds\Map
identical. However, the array
will always keep the allocated memory when Ds\Map
, in turn, will free up memory if the size falls below a quarter of its potential.Set
is a collection of unique values . The tutorials will tell you that in the Set
structure, the values are unordered, unless the implementation provides otherwise. Take for example Java, java.util.Set
is an interface with two main implementations: HashSet
and TreeSet
. HashSet
provides O(1)
complexity for add
and remove
, a TreeSet
provides a sorted dataset, but add
and remove
complexity increases to O(log n)
.Set
uses the same internal structure as Map
, also based on array
. This means that Set
can be sorted in O(n * log(n))
when needed, otherwise it is as simple as Map
and array
.add
, remove
and contains
have complexity O(1)
Hashable
interfaceSplObjectStorage
only supports objects).intersection
, difference
, union
, exclusive or
)push
, pop
, insert
, shift
or unshift
get
has complexity O(n)
if there are remote values before indexing, otherwise - O(1)
stdClass
. It shows that Ds\Set
slightly faster than SplObjectStorage
, and uses about half as much memory.array_unique
, which creates a new array
containing only unique values. But it is important to keep in mind that the values in the array are not indexed , in_array
is a linear search with complexity O(n)
. array_unique
works only with values, excluding keys, each checking for the presence of an array value is a linear search, which gives us a total of O(n²)
complexity in time and O(n)
memory consumption.PHP 7.0.3
build for 2015 Macbook Pro . Results may vary by version and platform.Stack
, Queue
, Set
and Map
not interfaces?Deque
instead of Vector
?shift
and unshift
, use Vector
. For convenient typehipping, you can specify as a type of Sequence
.php-ds
API design applies the “Composition over inheritance .SplStack
extends SplDoublyLinkedList
, which supports random access by index, shift
and unshift
- so technically it is not a stack .ArrayDeque
has three methods for adding elements: add
, addLast
and push
. This is not bad, because ArrayDeque
implements Deque
and Queue
, which explains the simultaneous presence of addLast
and push
. However, all three methods at once, doing the same thing, cause confusion and inconsistency.java.util.Stack
extended java.util.Vector
, thereby stating that “a more complete and consistent set of LIFO operations is provided by the Deque
interface and its implementations,” but Deque
includes the addFirst
and remove(x)
methods that should not be part of the stack
structure by API.array
. You cannot inherit from an array
, it forces you to develop your own API around you, using it only to store actual data.ds
class in the global namespace?LinkedList
class actually appeared first, it seemed like a good start. But in the end I decided to remove it when I realized that it could not compete with Vector
or Deque
in any case. The two main reasons for possible support are: allocation of overhead and locality of links .Vector
and Deque
, allocate a buffer of memory in advance, so there is no need to do it so often. They also do not need additional pointers to know what the value is before and what after, thereby reducing overhead costs.Vector
is (1.5 * (size - 1)) * zval
bytes, at least * 10 * zval *. In a doubly linked list, it will be used (size * (zval + 8 + 8))
. Therefore, a coherent list will use less memory than Vector
only when its size is less than 6 elements.Vector
and Deque
significant advantage: the elements are physically next to each other.»Incomplete data in structures is the root of all the evils of performance. Specifically, please say no to linked lists. "
"There is almost nothing more harmful than what you can do to kill all the advantages of modern microprocessors than using a coherent list"
- Chandler Carruth ( CppCon 2014 )
Source: https://habr.com/ru/post/280262/
All Articles