Hierarchy in DB on primes

Hello, dear.

The task is pretty trite - to store the tree in a relational database. It is not difficult in itself, but some practical questions make the brain wrinkle. There are a lot of interesting solutions, but another one has occurred to me. I'm not sure that it is original, but I haven’t seen such a decision.
Think of a way to store a tree in the database - not difficult. It is difficult to come up with so that the data was easy to get. Here are some of the most typical questions:

Check whether the node is a descendant of the specified node.
Select all the descendant nodes of this node.
Select all ancestors of a given node

The first two points in the described solution are performed by a very simple query, without any associations and subqueries, regardless of the depth of the hierarchy. . On the last point - have not yet had time to think :(

Data structure

I proceed not from the most typical data structure, and therefore I want to explain it a little. There is a table of entities (it completely depends on the subject area, let it be, let's say, a list of goods). In each row we add another field - the identifier of the category of goods. But the categories themselves are in a different table. This situation is slightly different from a fairly common scheme, where there is one table describing all entities, and for each entity a field is entered that clarifies whether this entity is a descendant node or a childless node (tree leaf).
In other words, the leaves are in one table, all the nodes are in another table.
Typically, in order to describe the hierarchy of nodes, the parent field is additionally entered in the table, in which the identifier of the parent node is written, or NULL if the node is a root (or even the root is only implied, and for “first shoots” of the root parent = NULL is specified ).

The essence of the task

If we neglect the order of the nodes on the same level (all the “brotherly” nodes), as most tasks allow, then the structure described above is sufficient for an exhaustive (within the scope of the task) structure description. This assumption is made for simplicity; to take into account the order of the brothers, the knot table will need to be slightly supplemented, and these additions can be easily compiled by anyone.
Problems, however, begin when it is required to turn something like that, for example, to display all the subnodes of the current node for some levels of offspring. To do this, you have to refer to the structure table (nodes) several times, each time choosing the right node according to a given hierarchy. For example, a set of identifiers of the first descendants is returned, then for each received identifiers we repeat the operation, etc. to the desired depth of the hierarchy.
In other words, it is necessary to use recursive functions. To avoid such an ugly and resource-intensive solution, invent various additional tables in which this auxiliary information would be stored. Pretty good solution, but, sorry, cumbersome. I would like something elegant, simple, and that the requests look very clear.
Several solutions exist, I'll remember a couple at a glance:

Numbering all nodes in a specific traversal order. ~~Unfortunately, I lost the link to the source.~~ Thanks adontz for the link. Allows you to make very simple and intuitive selects, but there is one drawback: you need to enumerate the nodes for any change in the hierarchy (for example, creating a new product category)
Store full path to current node. Resource-intensive and not very convenient. Alternatively, the path is “encoded” as a number in which each digit indicates the identifier of a particular ancestor. For example, the number “53” could indicate that the current node is in the fifth subgroup of the third group. The obvious disadvantage of such a presentation is strict restrictions on the number of groups / subgroups.

My decision

Immediately, I’ll make a reservation that it is possible (even likely) that it’s not exactly mine and was invented and used by someone, but I didn’t copy and paste from anywhere, and therefore let me call this solution “mine” :)
So, my solution is a variation of the second type of solutions, but is devoid of many of the drawbacks of most of the existing implementations. True, it has its own;) But according to sound reasoning, the hierarchs with the number of knots are not palpable, I think, of the order of several hundred thousand. But more about that below.

I suggest:

In the table of nodes, as identifiers, use prime numbers, and in the “parent” field - the product of identifiers of all ancestors.
')

What does this give?

The graceful construction of some queries, since it is possible for one action to find out exactly whether certain two nodes are in the relationship “parent” - “descendant”.
As you know , every complex number is uniquely decomposed into prime factors. Consequently, a sign of the existing parent-child relationship ( regardless of the distance in the hierarchy ) will be the divisibility of the “parent” field of the intended child to the identifier of the intended parent. Or, slightly changing the wording, the expression

parent MOD < > = 0

A couple of examples

The category (node) table has the simplest structure:
CREATE TABLE categories (`id` INT(11), `parent` INT(11), `name` VARCHAR(50), PRIMARY KEY `id`);
Selection of all nodes that are specified among ancestors:

SELECT * FROM `categories`
WHERE `parent` MOD <`id` > = 0;

To calculate the parent field for a new node, you just need to multiply the id and parent fields of the newly created parent.

disadvantages

Not still thought;)
Prime numbers are scary beasts, they are not easy to hunt ...
Additionally, a restriction is imposed on the complexity of the hierarchy in the sense that each parent field is a multiplication of identifiers of all parents, and a well-developed hierarchy will require a fairly large field size (albeit, solved using BIGINT)
Inconvenient search for all ancestors: you have to factorize the parent field

Regarding the first point - here I ask for help from habra people :) As a main task, I suggest estimating the various queries required to work with the product catalog.

On the second point I want to explain a little. First, about the number of primes : in the range from 1 to one billion is about 50 million primes. In other words, in 10 bits (by default, MySQL uses 11 bits for INT), you can cram 50 million nodes. However, the situation with the “parent” field is more sad: if we have 50 million nodes, then the size of INT (11) may not be enough: (... Enough or not - depends on the type of specific structure.

Generating simple numbers is an ungrateful task, but for the product catalog (and there are usually no more than a few hundred groups / subgroups), it is completely solvable.

If you go a little beyond the catalog of goods, then you can use a table of prime numbers to get a new simple number (identifier of a new node). Those. directly in the database to get a table of prime numbers, one field - the sequence number, the second - the corresponding prime number. Having estimated the power of the hierarchy in advance, you can create a table of prime numbers of suitable size.

However, as far as I can tell, the change in the hierarchy of categories occurs much less frequently than the extraction of data about the hierarchy.

Source: https://habr.com/ru/post/80965/

All Articles