Good afternoon, dear readers.
Today's article will be devoted to comparing models of work with hierarchical data in PostgreSQL, through the Django application. In this article, I do not specifically use a clean implementation in the database, because I am interested in performance in an environment close to the combat one.
All tests were conducted on my data and the results are relevant for these data, on your data the result may differ from those obtained in the article.
The following models can be used to work with such structures in PostgreSQL:
You can also read more about the implementation of hierarchical structures in a relational database here .
The following tools are selected for their implementation in Django:
Testing will be conducted on a data set of 150 thousand companies. Time will be measured for
the following requests:
The query execution time will be considered the average execution time of 1000 queries for each item, to a randomly selected element of the tree.
For testing created a separate Django application. It created 3 models, one for each storage scheme described above. This is done so that at any time it was possible to conduct similar testing on different data sets.
This application has two commands:
load_tree
- loads data for a testanalize
- performs data collection and analysis based on the specified number of requestsTo run the application and collect statistics, you need to run a sequence of commands:
python manage.py migrate python manage.py load_tree < > python manage.py load_tree analize <- >
The results of the analize
are stored in the report folder.
After executing the specified commands, for my data, I received the following results. To ensure that the names of the models do not mislead you, I will give below the correspondence between the models from the test and the storage models:
As can be seen from the diagram, the Mptt and Ltree models showed approximately the same result, but the native Raw model was faster.
In this test, unlike the previous point, the Mptt model caught up in speed with the standard implementation, while the Ltree, on the contrary, worsened its result. It should be noted that the Raw model uses a recursive query (WITH RECURSION) to get descendants, and, in spite of this, it worked the fastest.
When inserting a node, the Ltree model again fell behind, but in this case it was most likely due to the fact that the field of the tree where the tree was stored consisted of id records, so I had to make 2 queries (insert, and then update the path field), which affected accordingly on performance.
In the displacement of a node with a subtree, Mptt showed the worst result, most likely due to the fact that when moving it must recount all the fields of the transferred nodes, which is not a quick operation. Moving Raw is the fastest, because, in fact, it’s just updating one field. Ltree is not much behind the leader, as it must update the paths of all all nodes of the portable subtree.
After comparing the implementation of hierarchical structures, I expected the implementation of the MP (Ltree) model to show the best result, but to my surprise, it showed only the second result, yielding to the native implementation of the AM (Raw) model. Well, the implementation of the NS (Mptt) model got the 3rd place, because in 2 tests out of 4 it lost by a large margin from the competitors.
Summary table with the results:
model | insert_node | move_node | read_node | read_tree |
---|---|---|---|---|
Ltree | 0.001955 | 0.010375 | 0.008745 | 0.025522 |
Mptt | 0.001006 | 0.855293 | 0.00104 | 0.115597 |
Raw | 0.001002 | 0.001 | 0.001012 | 0.021957 |
It should also be noted that the results may vary for other data sets and therefore it is recommended to conduct a similar test for your data before making a decision.
PS This is a cross-post of the original article from my blog.
PS Updated tree reading schedule after adding sorts
Source: https://habr.com/ru/post/329204/
All Articles