📜 ⬆️ ⬇️

Data Structures, PHP

This post is a translation and is intended for beginners. Well, or for those who have forgotten lectures from the initial courses of their universities. Most likely, this material has already come across to Habré in one way or another modification, but here the emphasis on PHP and its features.

Data Structures or Abstract Data Type ( ADT ) is a model defined as a set of operations that can be applied to itself and limited by what result these operations produce.
Most of us encounter a stack and a queue in everyday life, but what is there in common between a queue in a supermarket and a data structure? We will try to understand this in this article, where trees will also be described.

http://www.thisiscolossal.com/2013/01/a-wooden-domino-tree-by-qiu-zhijie/

')

UPD : s01e02



  1. Stack
  2. Turn
  3. Tree


Stack


A stack is usually described as a certain set of objects, which is grouped together, where each element of a common set follows each other - a stack of books or trays stacked among themselves. In computer science, the stack is a set of objects that has the general rule of formation: the last object pushed onto the stack is retrieved first from the general list. Such a rule is also called “Last entered, first released” or LIFO . There is a reverse rule - the first one entered, the first one came out ( FIFO ), but more on that later.
This rule, LIFO, is used, for example, in vending machines for cigarettes, sweets - the last object loaded there will be issued first.

The abstract definition of a stack is a list, all operations for which are defined relative to one end, i.e. top of the stack.
Basic operations that define the stack:



You can also determine the stack with the maximum possible number of elements, but these are trifles. However, when the stack can no longer accept elements, the stack is overflowed and it returns a message about it (stack overflow). Well, the opposite situation - the removal of the element from the stack (stack underflow).

Knowing that our stack is defined as LIFO and its basic operations, we can write our stack through an array, especially since we already have for this basic operations push and pop. Our example would be:

<?php class ReadingList { protected $stack; protected $limit; public function __construct($limit = 10) { //   $this->stack = array(); //        $this->limit = $limit; } public function push($item) { // ,      if (count($this->stack) < $this->limit) { //       array_unshift($this->stack, $item); } else { throw new RunTimeException(' !'); } } public function pop() { if ($this->isEmpty()) { //     throw new RunTimeException(' !'); } else { //     return array_shift($this->stack); } } public function top() { return current($this->stack); } public function isEmpty() { return empty($this->stack); } } 


In this example, we used the PHP functions array_unshift () and array_shift () instead of array_push () and array_pop (), so our first stack element will always be on top, otherwise we would have the n-th stack element at the top. There is not much difference. Now add some items to our stack:

 <?php $myBooks = new ReadingList(); $myBooks->push('A Dream of Spring'); $myBooks->push('The Winds of Winter'); $myBooks->push('A Dance with Dragons'); $myBooks->push('A Feast for Crows'); $myBooks->push('A Storm of Swords'); $myBooks->push('A Clash of Kings'); $myBooks->push('A Game of Thrones'); 


And now we extract some elements from it:

 <?php echo $myBooks->pop(); //    'A Game of Thrones' echo $myBooks->pop(); //    'A Clash of Kings' echo $myBooks->pop(); //    'A Storm of Swords' 


What we have now is the top of the stack?

 <?php echo $myBooks->top(); //  'A Feast for Crows' 


If you call the pop () method again, “A Feast for Crows” will be removed from the stack. If you push, and then immediately pop, the stack will not change, since our stack works on the basis of "the last one went in, the first one went out." If we continue to pull items from the stack, then sooner or later we will get an exception with the message that the stack is empty.

SPLStack

PHP (SPL Extension) provides us with implementations of various data structures, including SplStack, starting with version 5.3. We can create our ReadingList simply by inheriting it:

 <?php class ReadingList extends SplStack { } 


SplStack gives us a bit more methods than we defined earlier, because SplStack implements a doubly linked list that has a capacity (the number of items in the stack) for implementing brute force.

A linked list, which is essentially a different abstract structure, is a list of nodes, each of which has a pointer to the next object. This structure can be represented in this way, with unidirectional brute force:

This image, K.O.


In the doubly linked list, each node has two pointers - to the previous and next nodes in the list. This structure allows you to search in two directions:

This image, K.O.


Going through the elements, we need to know where we have the entire list ends - for this are the very elements crossed out inside.

Turn


Here we come to the "first entered, first out" or the FIFO. Anyone who stood in the real queue knows that the one who took the place first, will be the first to leave. Exception - objects that only sign, ask, etc.

The basic operations for queues are:



PS For those who are familiar with the Prologue - in this case, the tail does not contain all the elements of the list, except for the head.

PHP also provides us with the SplQueue class (doubly linked list), only in this case the head of the list is the last element. We define our ReadingList as a queue:

 <?php class ReadingList extends SplQueue { } $myBooks = new ReadingList(); //       $myBooks->enqueue('A Game of Thrones'); $myBooks->enqueue('A Clash of Kings'); $myBooks->enqueue('A Storm of Swords'); 


SplQueue is inherited from SplDoublyLinkedList and implements the access interface as an array, thus allowing to access our queues and stacks through arrays:

 <?php $myBooks[] = 'A Feast of Crows'; $myBooks[] = 'A Dance with Dragons'; 


Remove a couple of items from the queue:

 <?php echo $myBooks->dequeue() . "\n"; //    'A Game of Thrones' echo $myBooks->dequeue() . "\n"; //    'A Clash of Kings' 


enqueue () is a pseudonym for push (), however dequeue () is not a pseudonym for pop () ! Pop () has a different behavior in the context of the queue, because if we use pop (), it will remove the element from the tail of the queue (A Dance with Dragons), since the FIFO rule is the main one in the queue.

You can see (without deleting) which element is the head of the list through the bottom () method:

 <?php echo $myBooks->bottom() . "\n"; //  'A Storm of Swords' 


Trees


ADT management is usually reduced to 3 operations: inserting elements into the structure, deleting and getting the element value from the structure. For stack and queues, these operations are position-dependent (xIFO). But what if we need to get information by value?

Imagine that we have such a label (the order of the elements does not matter):

This image, K.O.


Obviously, the stack or the queue will not help us in this case, since we will have to bypass all the values, if the one we need is at the end or is missing at all. Suppose the item is in the list. Then, to search for it, we will have to go, on average, for n / 2 elements, where n is the length of the entire list. More list - more time taken to crawl. To solve this search problem, you need to somehow arrange the data so that the search by structure is simplified. And here trees appear.

An abstract example of this structure is a table with the following operations:



Yes, this is similar to the well-known CRUD (Create Reading Update Delete) from databases, because trees and databases are closely related.

One of the ways to present our table is linear, i.e. describe it line by line. Such a record can be sorted, be sequential (i.e. records with a limited length or different lengths, with delimiters) or related (using pointers to data). Such an idea had an early database and file systems (FAT, for example). However, sequential writing has a minus - it is bad for inserts and deletes data, while linked recording allows you to dynamically allocate space for new data. Also, sequential writing with a fixed length is less effective than linked. Therefore, for a binary search tree, it is better to select a related entry.

Trees are just the implementation of non-linear search, they provide more efficient capabilities of two types - sequential and related, they support all table operations. Therefore, many modern databases use exactly trees (MyISAM uses trees for indexes).

This image, K.O.


As you can see from this picture, trees are a hierarchical structure, where between nodes there is a parent → child relationship. A node without descendants is an end node (leaf), a descendant without a parent is the root of a tree, the connections between nodes are edges. A node with two descendants is a simple tree and based on this, we can represent a tree as a recursive list of such nodes. It is worth noting that the tree in its structure resembles a doubly linked list.

Therefore, our tree can be described as follows:

 <?php class BinaryNode { public $value; //   public $left; //    BinaryNode public $right; //    BinaryNode public function __construct($item) { $this->value = $item; //   -  $this->left = null; $this->right = null; } } class BinaryTree { protected $root; //   public function __construct() { $this->root = null; } public function isEmpty() { return $this->root === null; } } 


Insert new values

Inserting new values ​​is a much more interesting topic. There are several solutions to this problem, based on the rotation and balancing of the tree, and for different tasks you can use different implementations of trees, such as red-black, AVL or B trees, having different performance indicators in the operations of inserting, deleting and traversing the tree.

For simplicity, we denote some of the rules of the simplest implementation: everything that is less than the current value goes left, more is right.
Repetitions will be excluded:

  1. If the tree is empty, insert [new node] as the root of the tree (obviously!)
  2. yet (the tree is not empty):
    • 2a. If ([current node] is empty) - insert here and stop;
    • 2b. If ([new node]> [current node]) - try to insert [new node] on the right and repeat step 2
    • 2c. If ([new node] <[current node]) - try to insert [new node] on the left and repeat step 2
    • 2d. Otherwise the value is already in the tree.


 <?php class BinaryTree { ... public function insert($item) { $node = new BinaryNode($item); if ($this->isEmpty()) { //  1 $this->root = $node; } else { //  1 $this->insertNode($node, $this->root); } } protected function insertNode($node, &$subtree) { if ($subtree === null) { //  2 $subtree = $node; } else { if ($node->value > $subtree->value) { //  2b $this->insertNode($node, $subtree->right); } else if ($node->value < $subtree->value) { //  2c $this->insertNode($node, $subtree->left); } else { //  ,  2d } } } } 


Deleting nodes is a completely different story and will not be affected. Perhaps some other time.

Tree traversal

Recall how we started from the root and went through the tree, knot by knot, to find an empty knot. There are 4 main tree traversal strategies:



The first three strategies are known as a detour in depth, which starts from the root of the tree (or the node designated as a node) and walk as deep as possible in the tree, as far as possible, before returning back. Each of these strategies is used for different purposes. Direct order, for example, is used when inserting new nodes (our case) or copying a subtree, the reverse is the other way around when removing nodes from the tree.

To understand how a symmetric pass works, we need to slightly change our example:

 <?php class BinaryNode { ... //      public function dump() { if ($this->left !== null) { $this->left->dump(); } var_dump($this->value); if ($this->right !== null) { $this->right->dump(); } } } class BinaryTree { ... public function traverse() { //        $this->root->dump(); } } 


Conclusion



I thank all who read to the conclusion, some more text and pictures :)

It is worth noting that the implementations of SplStack , SplQueue, and the doubly linked list are not fully covered. The PHP documentation for them hides quite a lot of methods, including counting the number of elements or using a structure as an iterator.

Also pay attention to this article from kpuzuc
Visualization of these structures, links to which can be found in a post from tangro

Performance benchmarks for implementations of these scopipastil structures from here
take a look
image image


image image


image image


As for the queue, the choice in favor of SPL is pretty obvious. The implementation of the stack does not particularly benefit the array, and the doubly linked list loses to the arrays by the number of operations per second. I honestly did not delve into the tests and most would like to know what it is connected with, but most likely with the fact that too many pulls from the "adjacent" interfaces, correct it if I am mistaken.

PS As always, in a personal error in the text and translation of the original.

Source: https://habr.com/ru/post/190176/


All Articles