Avl-trees

If in one of my past posts we talked about a fairly modern approach to building balanced search trees, this post is devoted to the implementation of AVL-trees - probably the very first kind of balanced binary search trees that were invented in 1962 by our (then Soviet) scientists Adelson -Welsky and Landis. You can find many implementations of AVL-trees (for example, here ) in the network, but everything that I personally saw does not inspire much optimism, especially if you try to figure everything out from scratch. Everywhere it is argued that AVL-trees are simpler than red-black trees , but looking at the code attached to this, you begin to doubt this statement. Actually, the desire to explain on the fingers how the AVL-trees are arranged was the motivation for writing this post. The presentation is illustrated with C ++ code.

The concept of the AVL-tree

AVL-tree is first of all a binary search tree, the keys of which satisfy the standard property: the key of any node in the tree is not less than any key in the left subtree of this node and not more than any key in the right subtree of this node. This means that you can use a standard algorithm to find the desired key in the AVL tree. For simplicity, further presentation will assume that all keys in the tree are integer and not repeated.
')
A feature of the AVL-tree is that it is balanced in the following sense: for any tree node, the height of its right subtree differs from the height of the left subtree by no more than one . It is proved that this property is sufficient for the tree height to logarithmically depend on the number of its nodes: the height h of the AVL-tree with n keys lies in the range from log ₂ (n + 1) to 1.44 log ₂ (n + 2) - 0.328. And since the basic operations on binary search trees (search, insertion and deletion of nodes) linearly depend on its height, we obtain a guaranteed logarithmic dependence of the operation time of these algorithms on the number of keys stored in the tree. Recall that randomized search trees provide a balance only in a probabilistic sense: the probability of obtaining a highly unbalanced tree for large n, although negligible, remains nonzero .

Node structure

We will represent nodes of the AVL-tree with the following structure:

struct node //      { int key; unsigned char height; node* left; node* right; node(int k) { key = k; left = right = 0; height = 1; } };

The key field stores the node key, the height field is the height of the subtree with the root in this node, the left and right fields are pointers to the left and right subtrees. A simple constructor creates a new node (height 1) with the given key k.

Traditionally, the AVL-tree nodes store not the height, but the difference in the heights of the right and left subtrees (the so-called balance factor), which can take only three values -1, 0 and 1. However, note that this difference is still stored in a variable the size of which is equal to at least one byte (if you do not invent any tricky schemes for “efficient” packaging of such quantities). Recall that the height h <1.44 log ₂ (n + 2), this means, for example, that with n = 10 ⁹ (one billion keys, more than 10 gigabytes of memory for storing nodes), the tree height will not exceed the value h = 44, which success is placed in the same one byte of memory as the balance factor. Thus, the storage of heights on the one hand does not increase the amount of memory allocated for tree nodes, but on the other hand it significantly simplifies the implementation of some operations.

We define three helper functions related to height. The first one is a wrapper for the height field, it can work with null pointers (with empty trees):

 unsigned char height(node* p) { return p?p->height:0; }

The second one calculates the balance factor of the specified node (and works only with non-zero pointers):

 int bfactor(node* p) { return height(p->right)-height(p->left); }

The third function restores the correct value of the height field of the specified node (provided that the values of this field in the right and left child nodes are correct):

 void fixheight(node* p) { unsigned char hl = height(p->left); unsigned char hr = height(p->right); p->height = (hl>hr?hl:hr)+1; }

Note that all three functions are non-recursive, i.e. their operation time is O (1).

Node balancing

In the process of adding or removing nodes in the AVL-tree, a situation may arise when the balance factor of some nodes is 2 or -2, i.e. unbalance occurs subtree. To correct the situation, we apply well-known turns around those or other tree nodes. Let me remind you that a simple turn to the right (left) produces the following tree transformation:

The code that implements the right rotation looks like this (as usual, every function that changes a tree returns a new root of the resulting tree):

 node* rotateright(node* p) //    p { node* q = p->left; p->left = q->right; q->right = p; fixheight(p); fixheight(q); return q; }

The left turn is a symmetrical copy of the right:

 node* rotateleft(node* q) //    q { node* p = q->right; q->right = p->left; p->left = q; fixheight(q); fixheight(p); return p; }

Let us now consider the situation of imbalance, when the height of the right subtree of the node p is 2 more than the height of the left subtree (the opposite case is symmetric and is implemented similarly). Let q be the right child of a node p, and s be the left child of a node q.

An analysis of possible cases within the framework of this situation shows that to correct the unbalance in the node p, it suffices to perform either a simple turn to the left around p, or a so-called large turn to the left around the same p. A simple rotation is performed under the condition that the height of the left subtree of the node q is greater than the height of its right subtree: h (s) ≤h (D).

A large rotation is applied under the condition h (s)> h (D) and in this case it reduces to two simple ones — first, a right rotation around q and then a left one around p.

The code that performs the balancing is reduced to checking the conditions and making turns:

 node* balance(node* p) //   p { fixheight(p); if( bfactor(p)==2 ) { if( bfactor(p->right) < 0 ) p->right = rotateright(p->right); return rotateleft(p); } if( bfactor(p)==-2 ) { if( bfactor(p->left) > 0 ) p->left = rotateleft(p->left); return rotateright(p); } return p; //    }

The described functions of turns and balancing also do not contain either cycles or recursion, and therefore they are executed in a constant time independent of the size of the AVL-tree.

Insert keys

Inserting a new key into the AVL-tree is performed, by and large, just as it is done in simple search trees: we go down the tree, choosing the right or left direction of motion depending on the result of comparing the key in the current node and the key being inserted. The only difference is that when returning from recursion (that is, after the key is inserted in either the right or left subtree, and this tree is balanced), the current node is balanced. It is strictly proved that the imbalance arising from such an insertion in any node along the path of movement does not exceed two, and therefore the application of the above described balancing function is correct.

 node* insert(node* p, int k) //   k     p { if( !p ) return new node(k); if( k<p->key ) p->left = insert(p->left,k); else p->right = insert(p->right,k); return balance(p); }

In order to verify the compliance of the implemented insertion algorithm with the theoretical estimates for the height of AVL trees, a simple computational experiment was carried out. An array of randomly arranged numbers from 1 to 10,000 was generated, then these numbers were sequentially inserted into the initially empty AVL-tree and the tree height was measured after each insertion. The results were averaged over 1000 calculations. The following graph shows the dependence on n of the average height (red line); minimum height (green line); maximum height (blue line). In addition, the upper and lower theoretical estimates are shown.

It can be seen that for random key sequences, the experimentally found heights fall into the theoretical limits even with a small margin. The lower limit is reachable (at least at some points) if the original key sequence is ordered in ascending order.

Deleting keys

With the removal of nodes from the AVL-tree, unfortunately, everything is not as chocolate as with randomized search trees. The method based on the merging (join) of two trees was neither possible to find nor invent.

Therefore, the variant described almost everywhere was taken as a basis (and which is usually used when removing nodes from the standard binary search tree). The idea is as follows: we find a node p with a given key k (if we don’t find it, then nothing needs to be done), in the right subtree we find the min node with the smallest key and replace the deleted node p with the found min node.

When implementing there are several nuances. First of all, if the found node p does not have a right subtree, then by the property of the AVL-tree on the left this node can have only one child node (tree of height 1), or the node p can be a leaf in general. In both of these cases, you simply need to remove the node p and return the pointer to the left child node of the node p as the result.

Suppose now that p has a right subtree. Find the minimum key in this subtree. By the property of the binary search tree, this key is located at the end of the left branch, starting from the root of the tree. Apply the recursive function:

 node* findmin(node* p) //        p { return p->left?findmin(p->left):p; }

Another service function will be to remove the minimum element from the specified tree. Again, by the property of the AVL-tree at the minimum element on the right, either a single node is suspended or empty. In both cases, you just need to return the pointer to the right node and perform a balancing on the way back (when returning from recursion). The minimal node itself is not deleted, since he is still useful to us.

 node* removemin(node* p) //        p { if( p->left==0 ) return p->right; p->left = removemin(p->left); return balance(p); }

Now everything is ready to implement the removal of the key from the AVL-tree. First, we find the desired node, performing the same actions as when inserting the key:

 node* remove(node* p, int k) //   k   p { if( !p ) return 0; if( k < p->key ) p->left = remove(p->left,k); else if( k > p->key ) p->right = remove(p->right,k);

Once the key k is found, go to Plan B: remember the roots q and r of the left and right subtrees of the node p; delete the node p; if the right subtree is empty, then return a pointer to the left subtree; if the right subtree is not empty, then we find the minimal element min there, then we extract it from there, on the left we suspend q and on the right we get what came out of r, we return min after balancing it.

  else // k == p->key { node* q = p->left; node* r = p->right; delete p; if( !r ) return q; node* min = findmin(r); min->right = removemin(r); min->left = q; return balance(min); }

When you exit the recursion do not forget to perform balancing:

  return balance(p); }

That's all! The search for the minimum node and its extraction, in principle, can be implemented in one function, while having to solve (not very difficult) problem with returning a pair of pointers from the function. But you can save on a single pass on the right subtree.

Obviously, the insertion and deletion operations (as well as a simpler search operation) are performed in a time proportional to the height of the tree, since in the process of performing these operations, a descent is made from the root to a given node, and at each level some fixed number of actions are performed. And due to the fact that the AVL-tree is balanced, its height depends logarithmically on the number of nodes. Thus, the execution time of all three basic operations is guaranteed logarithmically dependent on the number of tree nodes.

Thank you all for your attention!

All code

 struct node //      { int key; unsigned char height; node* left; node* right; node(int k) { key = k; left = right = 0; height = 1; } }; unsigned char height(node* p) { return p?p->height:0; } int bfactor(node* p) { return height(p->right)-height(p->left); } void fixheight(node* p) { unsigned char hl = height(p->left); unsigned char hr = height(p->right); p->height = (hl>hr?hl:hr)+1; } node* rotateright(node* p) //    p { node* q = p->left; p->left = q->right; q->right = p; fixheight(p); fixheight(q); return q; } node* rotateleft(node* q) //    q { node* p = q->right; q->right = p->left; p->left = q; fixheight(q); fixheight(p); return p; } node* balance(node* p) //   p { fixheight(p); if( bfactor(p)==2 ) { if( bfactor(p->right) < 0 ) p->right = rotateright(p->right); return rotateleft(p); } if( bfactor(p)==-2 ) { if( bfactor(p->left) > 0 ) p->left = rotateleft(p->left); return rotateright(p); } return p; //    } node* insert(node* p, int k) //   k     p { if( !p ) return new node(k); if( k<p->key ) p->left = insert(p->left,k); else p->right = insert(p->right,k); return balance(p); } node* findmin(node* p) //        p { return p->left?findmin(p->left):p; } node* removemin(node* p) //        p { if( p->left==0 ) return p->right; p->left = removemin(p->left); return balance(p); } node* remove(node* p, int k) //   k   p { if( !p ) return 0; if( k < p->key ) p->left = remove(p->left,k); else if( k > p->key ) p->right = remove(p->right,k); else // k == p->key { node* q = p->left; node* r = p->right; delete p; if( !r ) return q; node* min = findmin(r); min->right = removemin(r); min->left = q; return balance(min); } return balance(p); }

Sources

B. Pfaff, Analogue to Binary Search Trees and Balanced Trees - libavl library description
N. Wirth, Algorithms and Data Structures - Balanced Virtual trees are just AVL-trees
T. Kormen et al., Algorithms: construction and analysis - about AVL-trees it is said in the exercises for the chapter about red-black trees
D. Knut, The Art of Programming - section 6.2.3 is devoted to the theoretical analysis of AVL-trees

Source: https://habr.com/ru/post/150732/

All Articles