📜 ⬆️ ⬇️

Degrees - the key to fast hierarchy (Django example)

There are many ways to organize hierarchical data storage. Recently, I was interested in the question on the structure of the catalog, for example, an online store. Namely, when Groups and Products are stored in different tables.
When navigating a visitor in Groups, the Products should be displayed from all Subgroups.

It would be desirable, having the Group code, to receive a quick query to the Product table, which would result in Products from the current Group and all its Subgroups.

Traditionally, the hierarchical group table looks like this:
1. ID - code
2. PARENT - parent code
3. NAME
...

Then, in order to receive the Goods from all Subgroups, you will need to bypass the subtree, collect all the codes of the Children and insert them into the query to the Product table. The method is the slowest to read.
')
You can also add a string field to the Group table:
4. ID_STR

It encodes the group hierarchy. It turns out like this:

1. A;
1.1. AA; 1.2. AB;
2. In;
2.1. VA; 2.2. BB;
...

This field is the key of the Group in the Product table.
Reading with such an organization is faster, but changing the hierarchy requires additional costs. Since the revisions are a priori less than readings, you can put up with it.

Thus, we can not make a traversal of the Group tree for the withdrawal of Products from all Subgroups, and do everything in one query.

Minuses:
- It is necessary to manually calculate the new string code.
- When transferring / deleting a group, rearrange the table of goods (change the string key).
- The limited options for the string key (in many respects, of course, depends on the coding system chosen, and in most cases this disadvantage can be neglected).
- Also, the disadvantages of this approach include the need to work with string data, which entails an additional load on the server.

You can get rid of the last flaw by making this field numeric.
After a little thought, I sketched test plates (Children - below and to the right of the Parent):

image
Pic1

The group table will look like this:
1. ID - integer code
2. PARENT - parent code
...

Product Table:
1. ID - code
2. GROUP - Group Code
...

Then all products from the Group and all Subgroups can be received in one request specifying a range of codes.

For example, for Products from Group 16 and all of its Subgroups (Figure 1, Table 2), you will need to select a range (GROUP> = 16) AND (GROUP <32)

The next drawback of this approach is the limited options. But is he principled?

The maximum 32-bit unsigned integer = 4,294,967,296.

If you use a key of this type, then, for example, with 6 Levels you can get 39 Children for each Parent, with 7 levels - 23, with 5 - 83, etc ...

The maximum required key value can be found by the formula:
(Number of Children + 1) ^ Number of Levels

If the database does not have the ability to store an unsigned integer, we can use a negative range, simply set the “upper” element (root of the tree) to be a negative number. To maximize the use of a 32-bit integer, with 6 levels and 39 children, this will be 2,048,000,000.

image
Fig. 2

For the example from the illustration above (Fig.2), the maximum unsigned integer is 64, if you use the negative range, then you will need to put root = -32

For most tasks, 4,294,967,296 is more than enough (at least for me). Need more levels - reduce the number of children, and vice versa. If such “frameworks” are still cramped for someone, you can take a 64-bit integer. The speed of working with such values ​​will be almost the same.

To check all this in work, I made the simplest Django application. In the model entered 2 additional fields - Level and Number of Unemployed Children. Also implemented the addition of a new element of the Group and its removal. The settings for the tree now stand there like this: Number of Children = 39, Number of Levels = 6. The key is a 32-bit integer. Of course, they can be changed (when changing the settings in the table there should be only one entry - root).
You also need to manually add the root element to the table.

#coding: utf-8 #       !!! # #  ,   ,   : # (+1)^ # #    (    ),      2 # # 32-      -2,147,483,647  2,147,483,648 # #  ,  6   39     #    id=-((39+1)^6)/2,         : # id = -2,048,000,000 # parent = -2,048,000,000 # level = 0 # count_childs= 0 # name = 'root' #      # #    int32  /: # 3/1624 # 4/255 # 5/83 # 6/39 # 7/23 # 8/15 #    ,     int64, .. BigIntegerField() LV = 6 #   ... CH = 39 # ...  from django.db import models from django.db.models.signals import post_delete #   ... from django.dispatch import receiver # ...       class Group(models.Model): id = models.IntegerField(primary_key=True) parent = models.ForeignKey('self') level = models.SmallIntegerField(blank=False, null=False) count_childs= models.SmallIntegerField(blank=False, null=False) name = models.CharField(max_length=150, blank=False, null=False) # ... def save(self, *args, **kwargs): #     ... if self.id==None: #     parent_obj=self.parent #        if (parent_obj.count_childs < CH) and (parent_obj.level < LV): #     self.id = ((CH+1)**(LV-parent_obj.level-1)) \ * (parent_obj.count_childs+1) + parent_obj.id #     1     self.level = parent_obj.level+1 #      self.count_childs = 0 #      parent_obj.count_childs+=1 parent_obj.save() else: # ... ,    ,  raise ValueError("     ") #   super(Group, self).save(*args, **kwargs) def __unicode__(self): return u'%s (lv=%s ch=%s id=%s)' % (self.name, self.level, self.count_childs, self.id) # END class Group #         @receiver(post_delete, sender=Group) def my_post_delete(sender, **kwargs): parent = (kwargs.get("instance")).parent if parent.count_childs > 0: parent.count_childs-=1 parent.save() 


To quickly check the work, you can make a test to get the tree as in Fig. 1. Table 2. To do this, you need to enter the root element with values ​​in the table:
id = 0
parent = 0
level = 0
count_childs = 0
name = 'root'
... and change the constants in the code above to LV = 3 CH = 3

Of course, in order for everything to work fully, you will need to add to the code:
- When transferring the Group, it is necessary to recalculate the codes of all Children. It is done in one request (UPDATE). It is necessary to control the transfer down, the tree is not rubber :)
- When deleting, if the Group is not “last right” in the Subgroup of the Parent, the vacant place must be filled with the rightmost Group of the same level. It is done in one request (UPDATE).

This has not yet been done (programming, for the last 15 years, only my hobby), but I see no obstacles to the implementation of these algorithms (the implementation “on paper” already exists).

Thus, it turned out a structure that works much faster than when using an encoded character key, well, and takes up less space.

When using int64, the size limit of the hierarchical table will be significantly lower. But int32 is enough for most tasks, after all, only Groups are stored in the hierarchy, and I haven’t yet met directories with parameters where Children> 39 and Levels> 6, (well, or didn’t notice).

I would be grateful for any comments.

UPD 01/20/2013
I apologize for editing the article, but I forgot to mention, about the not obvious, maybe the moment ... Since the code of any element, if there are data on the dimension of the tree (and we have them), carries all the information about its position, then Group transfers in a table are made in one request.
UPD 01/22/2013
Wrote a more detailed article about this method . It has links to files with the full implementation of the algorithm on Django

Source: https://habr.com/ru/post/165713/


All Articles