📜 ⬆️ ⬇️

Performance comparison of hierarchical models of Django and PostgreSQL

Good afternoon, dear readers.


Today's article will be devoted to comparing models of work with hierarchical data in PostgreSQL, through the Django application. In this article, I do not specifically use a clean implementation in the database, because I am interested in performance in an environment close to the combat one.


All tests were conducted on my data and the results are relevant for these data, on your data the result may differ from those obtained in the article.


Model implementation of hierarchical structures in the database


The following models can be used to work with such structures in PostgreSQL:


  1. Adjacency model (AM) - model when the parent is stored in the column;
  2. Nested Sets (NS) - model, when a range of all nested elements is stored in a pair of columns;
  3. Materialized Path model (MP) - model, when the full path to the element is stored in the column.

You can also read more about the implementation of hierarchical structures in a relational database here .


The following tools are selected for their implementation in Django:


  1. AM - Django-based ForeignKey-based recursion;
  2. NS - django-mptt module;
  3. MP - PostgreSQL ltree module with LtreeField wrapper;

Testing method


Testing will be conducted on a data set of 150 thousand companies. Time will be measured for
the following requests:


  1. Reading the whole tree
  2. Read arbitrary node
  3. Insert node
  4. Moving a subtree between levels

The query execution time will be considered the average execution time of 1000 queries for each item, to a randomly selected element of the tree.


Test bench hardware



Test bench software



Testing Tool Description


For testing created a separate Django application. It created 3 models, one for each storage scheme described above. This is done so that at any time it was possible to conduct similar testing on different data sets.


This application has two commands:



To run the application and collect statistics, you need to run a sequence of commands:


 python manage.py migrate python manage.py load_tree <    > python manage.py load_tree analize <-   > 

The results of the analize are stored in the report folder.


results


After executing the specified commands, for my data, I received the following results. To ensure that the names of the models do not mislead you, I will give below the correspondence between the models from the test and the storage models:



Reading the whole tree


read_tree_chart


As can be seen from the diagram, the Mptt and Ltree models showed approximately the same result, but the native Raw model was faster.


Read arbitrary node


read_node_chart


In this test, unlike the previous point, the Mptt model caught up in speed with the standard implementation, while the Ltree, on the contrary, worsened its result. It should be noted that the Raw model uses a recursive query (WITH RECURSION) to get descendants, and, in spite of this, it worked the fastest.


Insert node


insert_node_chart


When inserting a node, the Ltree model again fell behind, but in this case it was most likely due to the fact that the field of the tree where the tree was stored consisted of id records, so I had to make 2 queries (insert, and then update the path field), which affected accordingly on performance.


Moving a subtree between levels


move_node_chart


In the displacement of a node with a subtree, Mptt showed the worst result, most likely due to the fact that when moving it must recount all the fields of the transferred nodes, which is not a quick operation. Moving Raw is the fastest, because, in fact, it’s just updating one field. Ltree is not much behind the leader, as it must update the paths of all all nodes of the portable subtree.


Results


After comparing the implementation of hierarchical structures, I expected the implementation of the MP (Ltree) model to show the best result, but to my surprise, it showed only the second result, yielding to the native implementation of the AM (Raw) model. Well, the implementation of the NS (Mptt) model got the 3rd place, because in 2 tests out of 4 it lost by a large margin from the competitors.


Summary table with the results:


modelinsert_nodemove_noderead_noderead_tree
Ltree0.0019550.0103750.0087450.025522
Mptt0.0010060.8552930.001040.115597
Raw0.0010020.0010.0010120.021957

It should also be noted that the results may vary for other data sets and therefore it is recommended to conduct a similar test for your data before making a decision.


Articles used:


  1. Representing Trees in PostgreSQL
  2. Trees in SQL Paths and Materialized Path
  3. Storage of trees in the database.
  4. Benchmark tree structure for Django
  5. Hierarchical Data Structures and Doctrine
  6. Ltree
  7. Django-mptt
  8. LtreeFiled

PS This is a cross-post of the original article from my blog.

PS Updated tree reading schedule after adding sorts

')

Source: https://habr.com/ru/post/329204/


All Articles