Performance comparison of hierarchical models of Django and PostgreSQL

Good afternoon, dear readers.

Today's article will be devoted to comparing models of work with hierarchical data in PostgreSQL, through the Django application. In this article, I do not specifically use a clean implementation in the database, because I am interested in performance in an environment close to the combat one.

All tests were conducted on my data and the results are relevant for these data, on your data the result may differ from those obtained in the article.

Model implementation of hierarchical structures in the database

The following models can be used to work with such structures in PostgreSQL:

Adjacency model (AM) - model when the parent is stored in the column;
Nested Sets (NS) - model, when a range of all nested elements is stored in a pair of columns;
Materialized Path model (MP) - model, when the full path to the element is stored in the column.

You can also read more about the implementation of hierarchical structures in a relational database here .

The following tools are selected for their implementation in Django:

AM - Django-based ForeignKey-based recursion;
NS - django-mptt module;
MP - PostgreSQL ltree module with LtreeField wrapper;

Testing method

Testing will be conducted on a data set of 150 thousand companies. Time will be measured for
the following requests:

Reading the whole tree
Read arbitrary node
Insert node
Moving a subtree between levels

The query execution time will be considered the average execution time of 1000 queries for each item, to a randomly selected element of the tree.

Test bench hardware

CPU Core i5 2.5 GHz
RAM 1600 MHz DDR3
SSD Samsung 850 EVO 500GB

Test bench software

Python 2.7
Postgres 9.6
Django 1.8
psycopg2

Testing Tool Description

For testing created a separate Django application. It created 3 models, one for each storage scheme described above. This is done so that at any time it was possible to conduct similar testing on different data sets.

This application has two commands:

load_tree - loads data for a test
analize - performs data collection and analysis based on the specified number of requests

To run the application and collect statistics, you need to run a sequence of commands:

 python manage.py migrate python manage.py load_tree <    > python manage.py load_tree analize <-   >

The results of the analize are stored in the report folder.

results

After executing the specified commands, for my data, I received the following results. To ensure that the names of the models do not mislead you, I will give below the correspondence between the models from the test and the storage models:

Raw - adjacency model
Mptt - nested sets
Ltree - materialized path

Reading the whole tree

read_tree_chart

As can be seen from the diagram, the Mptt and Ltree models showed approximately the same result, but the native Raw model was faster.

Read arbitrary node

read_node_chart

In this test, unlike the previous point, the Mptt model caught up in speed with the standard implementation, while the Ltree, on the contrary, worsened its result. It should be noted that the Raw model uses a recursive query (WITH RECURSION) to get descendants, and, in spite of this, it worked the fastest.

Insert node

insert_node_chart

When inserting a node, the Ltree model again fell behind, but in this case it was most likely due to the fact that the field of the tree where the tree was stored consisted of id records, so I had to make 2 queries (insert, and then update the path field), which affected accordingly on performance.

Moving a subtree between levels

move_node_chart

In the displacement of a node with a subtree, Mptt showed the worst result, most likely due to the fact that when moving it must recount all the fields of the transferred nodes, which is not a quick operation. Moving Raw is the fastest, because, in fact, it’s just updating one field. Ltree is not much behind the leader, as it must update the paths of all all nodes of the portable subtree.

Results

After comparing the implementation of hierarchical structures, I expected the implementation of the MP (Ltree) model to show the best result, but to my surprise, it showed only the second result, yielding to the native implementation of the AM (Raw) model. Well, the implementation of the NS (Mptt) model got the 3rd place, because in 2 tests out of 4 it lost by a large margin from the competitors.

Summary table with the results:

model	insert_node	move_node	read_node	read_tree
Ltree	0.001955	0.010375	0.008745	0.025522
Mptt	0.001006	0.855293	0.00104	0.115597
Raw	0.001002	0.001	0.001012	0.021957

It should also be noted that the results may vary for other data sets and therefore it is recommended to conduct a similar test for your data before making a decision.

Articles used:

PS This is a cross-post of the original article from my blog.

PS Updated tree reading schedule after adding sorts

Source: https://habr.com/ru/post/329204/

All Articles