📜 ⬆️ ⬇️

Partitioning Model Tables in Django with PostgreSQL

Hey.
This topic is about how relatively quickly and painlessly set up partitioning (partitioning) of a table by month if you use Django + PostgreSQL. Much of this is suitable for other frameworks and ORM.

You can read about partitioning and why, for example, here , here and here .

So, there is a project on Django and the table of one of the models should be very large. If reading from this table occurs frequently, and the time period in which the record was made is always known, partitioning will speed up the database.
')
Each time, writing queries to enable partitioning is not very desirable, so let's try to automate. Well, if the output will be something that can be used and not very familiar with SQL people. I've read the docs

First, I will tell you how to quickly try my work, and then about what is under the hood. We act like this:
  1. We catch syncdb so that you can add partitioning commands.
  2. We connect SQL which will pull out the established indexes, will create sections, activates indexes on them, will add functions and triggers.
  3. We activate sectioning in final applications and models.

First install the package from the repository.
 pip install git+https://github.com/podshumok/django-post-syncdb-hooks 

and connect a couple of applications:
 INSTALLED APPS = ( # ... 'post_syncdb_hooks', 'post_syncdb_hooks.partitioning', 'your.app', # ... ) 

Let there is a model in yourapp/models.py :
 from django.db import models class MyModel(models.Model): date_created = models.DateTimeField() my_data = models.IntegerField(db_index=True) 

add the file yourapp/sql/post_syncdb-hook.postgresql_psycopg2.sql (you can edit the intervals for the needs):
 SELECT month_partition_creation( date_trunc('MONTH', NOW())::date, date_trunc('MONTH', NOW() + INTERVAL '1 year' )::date, 'yourapp_mymodel', 'date_created'); 

syncdb :
 ./manage.py syncdb 
... and sections are created.

If this is the first syncdb , then there will be no indexes in the created sections. To fix this, you need to run syncdb again.

Well, the base is now ready, but Django is not there yet. The fact is that, starting with version 1.3, Django constructs INSERT INTO queries for PostgreSQL, adding RETURNING... to them to get the id inserted record. And the partitioning method we use does not support this feature.

We can roughly force Django to not use RETURNING anywhere:
 from django.db import connections, DEFAULT_DB_ALIAS connections[DEFAULT_DB_ALIAS].features.can_return_id_from_insert = False 

And we can edit our model so that RETURNING not used only with it:
 from post_syncdb_hooks.partitioning import to_partition class MyModel(models.Model): "..." #... @to_partition def save(self, *args, **kwargs): return super(MyModel, self).save(*args, **kwargs) 

So, the base is ready, Django is ready, but are we ready? In order to not allow read requests to poll all sections of the database, we must filter the QuerySet 's by the field for which partitioning is implemented (in the example, date_created ):
 qs = MyModel.objects.filter(date_created__lte=..., date_created__gt=...) 

You should also make sure that nothing happens anywhere, without special need, count() all the records: for example, the paginator in the admin paginator likes to do it.

Now everyone is ready. Everything.

Under the hood about post_syncdb_hooks

post_syncdb_hooks contains management.py , which connects the receiver to the post_syncdb signal. This receiver or hook is called for all installed applications. It scans whether the sql folder is next to the models.py file of an application, and if it is, whether there are post_syncdb-hook.sql or post_syncdb-hook.(backend).sql that can be run.

post_syncdb_hooks.partitioning contains exactly this SQL file. In it, we create a SQL function that takes four parameters: the start date, the end date, the table name, and the field name. When the function is called, sections for the specified field are created for the specified table, starting from the start date to the end date in monthly steps, as well as a trigger for INSERT INTO and section-table indexes (if the indices are specified for the main table).

Under the hood indexes

The most boring and difficult for me was the creation of indexes for sections. When there is written SQL to create the table and we go to the schema with partitioning, then there is no problem - copy-paste. But when Django creates tables, it’s not very clear how to make it not create indexes for the master table, but keep the corresponding SQL. In the end, I decided: let Django create indexes (they will still be empty), and I will copy them into sections.

It remains to decide how to do this.

psql has the -E option, which causes it to output SQL generated by internal commands. So that,
 sudo -u postgres psql -E db 
 db=# \di ********* QUERY ********** SELECT n.nspname as "Schema", c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type", pg_catalog.pg_get_userbyid(c.relowner) as "Owner", c2.relname as "Table" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid LEFT JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid WHERE c.relkind IN ('i','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; ************************** db=# 

A little copy-paste and we have everything to create indexes for table-sections.

Thank!

The package presented here is used by me in a couple of projects and quite successfully copes with the tasks set for it.
Thanks to everyone who read this. Hope made by someone come in handy.
Also, please note the shortcomings found in the comments and githabas , suggest improvements and extensions.

Source: https://habr.com/ru/post/179873/


All Articles