Degrees are the key to fast hierarchy in a relational database

After the publication of my first article on Habré, about one of the ways of organizing a hierarchy in a relational database, I still have the feeling of not finishing the job.
Judging by the comments, someone accepted the proposed method after another, asked what they were not comfortable with “django-mttp”, told about the support of trees in PostgreSQL ...
Thanks to everyone who has unsubscribed, but because of the chaotic presentation in the article itself, I think that I could not convey to the reader what I wanted. And “if I decided something, I will drink it without fail” (c)

Therefore, I decided on another attempt to present the approach of interest to me. Namely, the hierarchy is stored in a numeric code calculated on the basis of data on the dimension of the tree. That is, the maximum number of Levels and the number of Children of each Parent are predetermined (the possible ranges are quite large, therefore, you should not be intimidated by this). With such input, the code of each hierarchical element will be the way to it, and include the range of all Children. And this promises speed, and a lot more then ...
Further - with pictures and tables, without being tied to any database (for this is not important).

Hierarchy storage methods

A description and comparison of the most common methods for implementing hierarchies in relational databases can be found in the excellent articles by Mikhail Stadnik aka Mikhus:
')
Hierarchical Data Structures and Doctrine
Hierarchical data structures and performance

Who has not read but is interested in this issue - I recommend. The accessible exposition and the text rich with examples save me from the need to retell everything (all the more so I will not succeed so well).

For the introduction, in my opinion, it is already enough, so I turn to the essence of the question.

Why did I get it into my head

So, I still needed to keep the hierarchy once. Initially, a simple and simple method of storing a key in a string was chosen. That is, the tree should look like this:

Pic1

It was supposed to have two tables - Groups and Elements (now it doesn’t matter what). The hierarchical table with Groups is relatively (!) Small and rarely changes. The table with the Elements, on the contrary, allows for a large number of them and actively rules, more precisely, it grows more. And even more actively from it are sampling within the boundaries of the Groups defined in the first table.
I deliberately do not start up the lengthy descriptions of the chosen method, its validity, the purpose of the database and the entire application. It doesn’t matter to my story today. (besides, the method proposed by me further can be used when working with one table)

Table - Groups
1. ID - Char
2. PARENT - Char - Reference to ID
...

Table - Items
1. ID - Integer
2. GROUP - Char - Link to ID in the Group table
...

This approach works. Good or bad, another question. We must remember that there are always pros and cons. For the intended application, this method suited me.

After some reflections, it occurred to me that character strings could easily be replaced with integer keys (Yes! It will work faster, take up less space, and be easier to implement, and just look beautiful ...). The main thing to provide non-intersecting nested ranges. Why not?

Then we will discuss only one table - hierarchical.

Initially, I tried to portray the whole tree as codes in the binary system. Happened. Having understood the essence, I drew a small tree with 3rd levels and 3rd possible Children at each Element. It looked like this (the system is, of course, decimal):

Pic2

Looks certainly not familiar.
The numbers on the top - show the size of the table, for convenience, on the left - Levels.
Each number is an Element ID (bold).
Children at the Element - below and to the right until the next Element at the same level. For example, Children at Element = 16 is a range of 16<ID<(31+1)

In Figure 2, the tree is already fully filled so that it is visible.

Here is a piece of wood in a more traditional way:

Pic.3

In this picture, it is more difficult to catch periodicals right away, and the tree is still not visible. Therefore, I will sign another column on the left, which will help us further, and will show the tree in a more “native” form:

Pic.4

It is very easy to notice dependencies.

In any Element - there is a 4-degree, that is ( + 1) .
At the highest, 1st level, it is 4^2 , at the 2nd level - 4^1 , at the 3rd level - 4^0 .

In other words, ( + 1) ( - ) ( + 1) must be raised to a degree equal ( - ) . The required degree is indicated in the “red” column. But it’s not just necessary to build, but also to add to the Parent as many times as he is a child.

For a change, I will show a table for a tree, with the number of Levels = 4 and Children = 2:

Pic.5

By analogy, you yourself can already understand what the basis and degree is here.

If there are difficulties, a small example in Figure 5 ...

Suppose we have an element with code 27 . No children yet.

To get the 1st Child, you need:
27 + ((2+1)^2) * 1 = 36
To get the 2nd Child:
27 + ((2+1)^2) * 2 = 45

etc. etc…

That's all. The rest is a matter of technology.

Calculating the id of a new child

If the tree is full, then it is self-sufficient. That is, knowing its dimension, we can calculate the Level of the Element and its place in the Parent node by ID .

But we just have to create a tree, and without extra queries to the database.

Therefore, I entered the COUNT_CHILDS field so that you can take the next free ID for the Child.
Also, to calculate the ID, you will need to know the current Parent Level. This does not require additional requests (by adding a new Child we are “on” the Parent), so I added the LEVEL field. The level can also be calculated, who is interested, they can easily do it. But I just entered an additional field.

As a result, we have the following hierarchical table structure:
ID - Integer
PARENT - Integer - link to ID
LEVEL - Integer
COUNT_CHILDS - Integer

Let the tree dimension be denoted by:
LV - Number of Levels
CH - Number of Children per Parent

id - Child Computed Code

Then, having data on the Parent, the next Child can get:

id = ((CH+1)^(LV-LEVEL-1)) * (COUNT_CHILDS+1) + ID

Restrictions

All this is good, but the integers are not infinite, and have a very specific range. For example, int32 can store 2^32=4 294 967 296 values. How can this affect the applicability of this method?
The answer is, as always, it all depends on the current task. But, I dare to assume, for most tasks, that's enough.

The maximum required key value for storing the Elements in such a structure can be found by the formula:
(+1)^

For example, for 6 Levels, with int32, you can afford 39 Children.
(39+1)^6 = 4 096 000 000

Just that amount suited me.
Less Levels - there will be more Children, and vice versa. To whom it is interesting - consider yourself, if the result is less than the maximum permissible integer you have, then everything is fine.

The use of int64 greatly expands the allowable scope. In this case, 1624 Children are already available for 6 Levels. The speed of work with such keys will not slow down much (sometimes it will not slow down at all).

The squad did not notice the loss of a fighter

The attentive reader will see that the most “left” range, disappears for use.
For example, in the tree, in Fig.2, the range of numbers from 0 to 16 is not involved. You can, of course, ponder with the initial element (we will talk about it later) in algorithms. But I did not - enough.

Initial element

In order for all this to work, the initial element of the Tree must be added to the database. It is from him that the “countdown” will begin. I just bring it “hands” to the table. How you will do it is not important, the main thing is to have an understanding of what you are doing and why.

So, for all the above illustrations, the initial element will be:
ID = 0
PARENT = 0
LEVEL = 0
COUNT_CHILDS = 0

Yes, he is not on top, as is customary to depict it. He seems to be "on the left." But this approach works, and this is important.

The sign does not change anything

Not all databases allow you to store an unsigned integer. But it absolutely does not affect the algorithms. Both on the calculation of the new Element and on the selection of nested ones.
Here is our example where the Tree is “shifted” to the negative range:

Fig. 6

For this example, the initial Element should be done like this:
ID = -32
PARENT = -32
LEVEL = 0
COUNT_CHILDS = 0

The 32-bit integer allows you to store from -2 147 483 647 to 2 147 483 648

We want the “maximum”, 6 Levels with 39 Children for each element.
Then we get the initial id=-((39+1)^6)/2 , and manually enter the Group into the database (or as you like) with the following values:
ID = -2,048,000,000
PARENT = -2,048,000,000
LEVEL = 0
COUNT_CHILDS = 0

Everything will work just as well.

Algorithm implementations

Perhaps this method has already been described and implemented by someone.
I, because of my darkness, unfortunately, did not read. Everything "invented" himself.
At the end of this article, I will give a link to the code in which the whole mechanism works: insert, transfer, delete.

Further description of the key points in the algorithms.

Insert new item

Tree dimension

CH - Permissible Number of Children for Each Element
LV - Permissible Number of Levels in the Tree

Again, this must be determined in advance. How - read the article again.

Have a Parent Record

ID - Integer
PARENT - Integer - link to ID
LEVEL - Integer
COUNT_CHILDS - Integer

Then you can get a new child by the formula:

id = ((CH+1)^(LV-LEVEL-1)) * (COUNT_CHILDS+1) + ID

Insertion is not done if the Parent cannot [more] have Children.

Transferring the Element with all the Children to another Parent

In fact, everything is easy.
During the transfer, we make changes to the record fields: id, parent, level, at the element itself, and for all possible Children. Everything is done in one request.

We need:
1. Portable Element - K , of course we have it
2. Parent - P , similarly
3. New Parent - NP , this we also have
4. AllowedNumber of Children in a Tree - CH , we always have it - constant
5. Difference in levels - dL , calculated from already available data.

Having the above, we simply UPDATE to the element and to all its possible children, calculating new IDs using the formula:

Id = (KP)*(CH+1)^dL+NP

The PARENT field is calculated similarly, the LEVEL field is changed according to dL.

Here is the key piece from the django code:

 x_tmp = Group.objects.filter(id__gte=oldElement.id) .filter(id__lt = limit_right) .update( id=((F('id')-id_departure)*((CH+1)**delta_level) + id_destination), parent = ((F('parent')-id_departure)*((CH+1)**delta_level) + id_destination), level = (F('level')-delta_level) )

We do not transfer if:
1. A New Parent Cannot [more] Have Children ...
2. or ... The portable Element has Children who will be “below” the allowable Level in the Tree. This situation will occur only when the transfer is “down”. How you handle it, simply prohibit the transfer of such an Element, or “cut” the Children to an acceptable level - it's up to you. I do not transfer such an Element.

Deletion

Delete only Element that does not have Children.
When you delete an item - reduce the number of children at the parent
Even nothing more to write.

Dug a hole - go to sleep

The children of each Parent are a sequence.
A situation may arise when, as a result of deletion or transfer, a “hole” is formed in this sequence. In the presence of a large number of permissible children, you can score on it. But it's not right. Therefore, during removal or transfer, it is necessary to catch such cases and “transfer” to the vacant position of the extreme right Child.

The algorithm here is a special case of the above Transfer. Only easier.

Application area

Everywhere where endless hierarchy is not needed.
~~and when is it needed in everyday applications?~~

Minuses:
- Restriction on the number of Children in the Group
- Limit on the number of Levels in the Tree

Some combinations of Children and Groups for int32 and int64:

Pic.7

Perhaps, looking at these numbers, someone decides that all is not so bad ...

Pros:
- Work speed
- small amount of data
- Almost always you can do with a single request to a range of keys
- Ease of implementation

I did not give comparisons with traditional methods. ~~In my humble opinion, if the above disadvantages are not fundamental, this is one of the best approaches that you can use.~~
Who has the desire - try to compare.

Thank you for your attention.
I apologize for any errors and typos.
I would be grateful for any comments.

PS
I tried to talk about the approach, the idea so to say ... In practice, this method has not yet been seriously tested. The issues of data integrity are not addressed (they are similar in other methods). And a lot more then ...

UPD 10/09/2015
Wrote treensl application for Django (django-treensl).
Sources github.com/EvgeniyBurdin/django_treensl
The logic of working with the primary key is in the database.
Currently implemented only for PostgreSQL 9.1 and higher.

Source: https://habr.com/ru/post/166699/

All Articles