📜 ⬆️ ⬇️

Data structures: binary trees. Part 2: an overview of balanced trees

The first article of the cycle

Intro


In the second article I will give an overview of the characteristics of various balanced trees. By characteristic, I mean the basic principle of operation (without describing the implementation of operations), the speed of work and the additional memory consumption compared to an unbalanced tree, various interesting facts, as well as links to additional materials.

Red black tree


Other names: Red-black tree, RB tree.
In this structure, the balance is achieved by maintaining the coloring of the peaks in two colors (red and black, as the name implies :), subject to the following rules:
  1. The red top cannot be the son of the red top
  2. The black depth of any leaf is the same (the black depth refers to the number of black peaks on the path from the root)
  3. Black tree root

Here we change the definition of the sheet somewhat, and we call it so special null-vertices that replace the missing sons. We assume such vertices to be black.

Example:
An example of red-black wood
')
Let's see what the maximum depth of a correct red-ebony with n vertices can be.
Take the deepest leaf. Let it be at depth h. Because of rule 1, at least half the vertices on the path from the root will be black, that is, the black height of the tree will be at least h / 2. It can be shown that in such a tree there will be at least 2 ^ (h / 2) -1 black vertices (since each black vertex with black depth k, if it is not a leaf, must have at least two children with black depth k + 1 ). Then 2 ^ (h / 2) -1 <= n or h <= 2 * log2 (n + 1).

All basic operations with a red-ebony tree can be implemented in O (h), that is, O (log n) as proved above. The classic implementation is based on parsing a large number of cases and is quite difficult to understand. There are more simple and clear options, for example, in the article by Chris Okasaki. Unfortunately, it only describes the insert operation in the tree. The simplicity compared to the classical implementation is obtained by focusing on clarity, and not on optimizing the number of elementary tree modifications (rotations).

To implement this type of trees, it is necessary to store an additional 1 bit of information at each vertex (color). Sometimes it causes a big overhead due to alignment. In such cases, it is preferable to use structures without additional memory requirements.

Red-black trees are widely used - the implementation of set / map in standard libraries, various applications in the Linux kernel (for queuing requests, ext3 etc.), probably in many other systems for similar needs.

Red-black trees are closely related to B-trees. We can say that they are identical to B-trees of order 4 (or 2-3-4 trees ). You can read more about this in an article on Wikipedia or in the book “Algorithms: Construction and Analysis” mentioned in the previous article.

Wikipedia article
Article in the English Wikipedia (with a description of operations)
red-black tree visualizer

AA-tree


A modification of the red-black tree, in which an additional restriction is imposed: the red top can only be a right-handed son. If a red-black derov is isomorphic to a 2-3-4 tree, then an AA-tree is isomorphic to a 2-3 tree.

Due to the additional restrictions, operations are easier to implement than in the case of a red-black tree (by reducing the number of cases to be analyzed). The estimate for the height of the trees remains the same, 2 * log2 (n). Their time efficiency is about the same, but since in the implementation, instead of color, they usually store another characteristic (“level” of the vertex), memory overhead reaches a byte.

Article in English Wikipedia

AVL-tree


Named so by the names of the Soviet mathematicians who invented it: G.. Adelson-Velsky and E.M. Landis.

Imposes the following restriction on the tree: at any vertex, the heights of the left and right subtrees must differ by no more than 1. It is easy to prove by induction that a tree with height h must contain at least F_h vertices, where F_i is the i-th Fibonacci number. Since F_i ~ phi ^ n (phi = (sqrt (5) +1) / 2 is the golden ratio), the height of a tree with n vertices cannot exceed log2 (n) / log2 (phi) ~ 1.44 * log2 (n)

The implementation, like that of the red-and-black tree, is based on case analysis and is rather difficult to understand (although IMHO is simpler than red-black) and has O (log (n)) complexity for all basic operations. For work, it is necessary to store at each vertex the difference between the heights of the left and right subtrees. Since it does not exceed 1, it suffices to use 2 bits per vertex.

A detailed description can be found in N. Wirth’s book “Algorithms + Data Structures = Programs” or in A. Shen ’s book “Programming: Theorems and Problems”

Wikipedia article

Cartesian tree


Other names: Cartesian tree, treap (tree + heap), ducha (tree + heap).

If you draw a tree on a plane, the key will correspond to the x-coordinate of the vertex (due to order). Then you can enter the y-coordinate (let's call it height), which will have the following property: the height of the vertex is greater than the height of the children (the same property has values ​​in another data structure based on binary trees - heap). Hence the second version of the name of that structure )

It turns out that if you choose heights randomly, the height of the tree satisfying the heap property will most likely be O (log (n)). Numerical experiments show that the height is approximately 3 * log (n).

The implementation of operations is simple and logical, due to this, the structure is very loved in sports programming). According to the test results, it was recognized as the most efficient in time (among red-black, AA and AVL - trees, as well as skip-lists (a structure that is not a binary tree, but with a similar scope) and radix-trees). Unfortunately, it has a sufficiently large overhead in memory (2-4 bytes per vertex, for storage of height) and is not suitable where guaranteed performance is required (for example, in the OS kernel).

Splay tree


This data structure is very different from all listed before. The fact is that it does not impose any restrictions on the tree structure. Moreover, in the process of work the tree can be completely unbalanced!

The base of the splay tree is the splay operation. It finds the desired vertex (or closest to it in the absence) and “pulls” it to the root by a special sequence of elementary rotations (a local operation on the tree, which preserves the property of order, but changes the structure). Through it, you can easily express all the tree operations. The sequence of operations in splay is chosen so that the tree “magically” works quickly.

Knowing the magic of splay operation, these trees are implemented not easily, but very easily, therefore they are also very popular in ACM ICPC, Topcoder etc.

It is clear that in such a tree it is impossible to guarantee the complexity of O (log (n)) operations (what if we are asked to find a deep-lying vertex in the currently unbalanced tree?). Instead, the amortized complexity of the operation O (log (n)) is guaranteed, that is, any sequence of m operations with a tree of size n runs behind O ((n + m) * log (n)). Moreover, a splay tree has some magical properties, due to which it can be much more effective in practice than other options. For example, the peaks that have been accessed recently are closer to the root and access to them is accelerated. Moreover, it is proved that if the access to the elements is fixed, then the splay-tree will work asymptotically no slower than any other implementation of binary trees. Another advantage is that there is no overhead in memory, since there is no need to store any additional information.

Unlike other options, the search operation in the tree modifies the tree itself, so if the elements are uniformly referenced, the splay tree will run slower. However, in practice it often gives a tangible increase in productivity. Tests confirm this - in tests obtained on the basis of Firefox, VMWare and Squid, the splay tree shows a performance gain of 1.5-2 times compared to red-black and AVL-trees. At the same time, splay-trees work 1.5 times slower on synthetic tests. Unfortunately, due to the lack of guarantees for the performance of individual operations, splay-trees are not applicable in real-time systems (for example, in the OS kernel, garbage-collectors), as well as in general-purpose libraries.

Article in English Wikipedia
Original article by R. Tarjan and D. Slate

Scapegoat tree


This tree is similar to the previous one in that it has no overhead from memory. However, this tree is fully balanced. Moreover, the coefficient 0 <alpha <0.5 of the "hardness" of a tree can be set arbitrarily and the height of the tree will be bounded above by the value of k * log (n) +1, where k = log2 (1 / alpha). Unfortunately, modification operations will be depreciated as in the past tree.

The stiffness coefficient greatly influences the balance of performance: the “harder” the tree is, the less height it will have and the faster the search will work, but the more difficult it will be to maintain order in modification operations. For example, since the AVL-tree is “harder” red-black, searching it is faster, and the modification is slower. If you use a scapegoat tree, the balance between these operations can be chosen depending on the specific use of the tree.

Article in English Wikipedia

A couple more words


The last two trees are very different from their competitors. For example, only they can be used to effectively implement the link / cut tree data structure used in the basis of the fastest known stream search algorithm in a graph. On the other hand, due to their depreciation essence, they cannot be used in many algorithms, in particular, for constructing ropes. The properties of these trees, especially splay-trees, are being actively studied by theorists.

In addition to balanced trees, you can use the following trick: implement a regular binary tree and in the process of work periodically do rebalancing. To do this, there are several algorithms, such as the DSW algorithm , working for O (n)

In the next series


I will tell in more detail about the Cartesian trees and their implementation.

General links


tree visualizer (can visualize all trees from the review)

Source: https://habr.com/ru/post/66926/


All Articles