Understanding partitions in PostgreSQL 9

PostgreSQL 10 was released in early October 2017, almost a year ago.

One of the most interesting new features is unconditionally declarative partitioning. But what if you are not in a hurry to upgrade to 10k? Amazon, for example, is not in a hurry, and introduced PostgreSQL 10 support only in the last days of February 2018.

Then the good old partitioning through inheritance comes to the rescue. I am a software architect of a finance department in a taxi company, so all the examples will somehow be related to travel (we’ll leave problems with money for another time).
')
Since we began to rewrite our financial system in 2015, when I just joined the company, there was no question of any declarative partitioning. So to this day, the technique described below is successfully used.

The original reason for writing this article was that most of the PostgreSQL partitioning examples I encountered were very basic. Here is a table, here is one column that we are looking at, and perhaps even know in advance what values lie in it. It would seem that everything is simple. But real life makes its own adjustments.

In our case, we partition the tables in two columns, one of which contains the travel dates. This is the case we consider.

Let's start with how our table looks like:

create table rides ( id bigserial not null primary key, tenant_id varchar(20) not null, ride_id varchar(36) not null, created_at timestamp with time zone not null, metadata jsonb -- Probably more columns and indexes coming here );

For each tenant, the table contains millions of rows per month. Fortunately, the data between the tenants never intersect, and the most difficult requests are made at a cut of one or two months.

For those who did not delve into how the partitions in PostgreSQL work (the lucky ones from Oracle, hello!) I will briefly describe the process.

PostgreSQL relies on its two “features” for this: the ability to inherit tables, table inheritance, and checked conditions.

Let's start with inheritance. Using the INHERITS keyword, we indicate that the table we create inherits all fields of the inherited table. It also creates a relationship between two tables: by making a request from parent, we also get all the data from the child.

Checked conditions complete the picture by ensuring that data is not intersected. Thus, the PostgreSQL optimizer can cut off part of the child tables, relying on data from the query.

The first pitfall of this approach would seem quite obvious: any request must contain a tenant_id. And nevertheless, if you don’t remind yourself of this constantly, sooner or later you will write custom SQL itself, in which you will forget to specify this tenant_id. As a result - scanning all partitions and non-functioning database.

But back to what we want to achieve. At the application level, I would like transparency - we always write in the same table, and already the database chooses where to put this data specifically.

To do this, we use the following stored procedure:

 CREATE OR REPLACE FUNCTION insert_row() RETURNS TRIGGER AS $BODY$ DECLARE partition_env TEXT; partition_date TIMESTAMP; partition_name TEXT; sql TEXT; BEGIN -- construct partition name partition_env := lower(NEW.tenant_id); partition_date := date_trunc('month', NEW.created_at AT TIME ZONE 'UTC'); partition_name := format('%s_%s_%s', TG_TABLE_NAME, partition_env, to_char(partition_date, 'YYYY_MM')); -- create partition, if necessary IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname = partition_name) THEN PERFORM create_new_partition(TG_TABLE_NAME, NEW.tenant_id, partition_date, partition_name); END IF; select format('INSERT INTO %s values ($1.*)', partition_name) into sql; -- Propagate insert EXECUTE sql USING NEW; RETURN NEW; -- RETURN NULL; if no ORM END; $BODY$ LANGUAGE plpgsql;

The first thing to notice is the use of TG_TABLE_NAME. Since this is a trigger, PostgreSQL fills in quite a few variables for us that we can access. Full list can be found here .

In our case, we want to get the parent name of the table on which the trigger worked. In our case it will be rides. We use a similar approach in several microservices, and this part can be transferred almost unchanged.

PERFORM is useful if we want to call a function that returns nothing. Usually, in the examples, all the logic is tried to be put into one function, but we try to be careful.

USING NEW indicates that in this query we are using values from the string we tried to add.

$1.* expands all the values of the new line In fact, this can be translated into NEW.* . What translates to NEW.ID, NEW.TENANT_ID, …

The following procedure, which we call using PERFORM , will create a new partition if it does not already exist. This will happen once a period for each tenant.

 CREATE OR REPLACE FUNCTION create_new_partition(parent_table_name text, env text, partition_date timestamp, partition_name text) RETURNS VOID AS $BODY$ DECLARE sql text; BEGIN -- Notifying RAISE NOTICE 'A new % partition will be created: %', parent_table_name, partition_name; select format('CREATE TABLE IF NOT EXISTS %s (CHECK ( tenant_id = ''%s'' AND created_at AT TIME ZONE ''UTC'' > ''%s'' AND created_at AT TIME ZONE ''UTC'' <= ''%s'')) INHERITS (%I)', partition_name, env, partition_date, partition_date + interval '1 month', parent_table_name) into sql; -- New table, inherited from a master one EXECUTE sql; PERFORM index_partition(partition_name); END; $BODY$ LANGUAGE plpgsql;

As described earlier, we use INHERITS to create a table similar to parent, and CHECK to determine which data should go there.

RAISE NOTICE simply prints the string to the console. If we run the INSERT from psql now, we can see if the partition has been created.

We have a new problem. INHERITS does not inherit indexes. For this, we have two solutions:

Create indexes using inheritance:
Use CREATE TABLE LIKE , and then ALTER TABLE INHERITS

Or create indexes procedurally:

 CREATE OR REPLACE FUNCTION index_partition(partition_name text) RETURNS VOID AS $BODY$ BEGIN -- Ensure we have all the necessary indices in this partition; EXECUTE 'CREATE INDEX IF NOT EXISTS ' || partition_name || '_tenant_timezone_idx ON ' || partition_name || ' (tenant_id, timezone(''UTC''::text, created_at))'; -- More indexes here... END; $BODY$ LANGUAGE plpgsql;

It is very important not to forget about child indexing of tables, since even after partitioning there will be millions of rows in each of them. Indexes on parent are not needed in our case, since the parent will always remain empty.

Finally, we create a trigger that will be called when creating a new line:

 CREATE TRIGGER before_insert_row_trigger BEFORE INSERT ON rides FOR EACH ROW EXECUTE PROCEDURE insert_row();

There is another subtlety that is rarely focused on. It is best to partition by columns, the data in which never change. In our case, it works: the trip never changes the tenant_id and created_at. The problem that occurs if it is not - PostreSQL will not return part of the data to us. We then promised him a CHECK that all the data is valid.

There are several solutions (except for the obvious - do not mutate the data for which we partition):

Instead of UPDATE , and at the application level, always do DELETE+INSERT
We add one more trigger on UPDATE , which will transfer data to the correct partition.

Another thing to consider is how to correctly index columns containing dates. If we use AT TIME ZONE requests, we must not forget that this is actually a function call. So, our index should be function based. I forgot. As a result, the base is again dead from the load.

The last aspect to consider is how partitions interact with various ORM frameworks, be it ActiveRecord in Ruby or GORM in Go.

PostgreSQL partitions rely on the fact that the parent table is always empty. If you do not use ORM, then you can safely return to the first stored procedure, and change RETURN NEW; on RETURN NULL ;. Then the row in the parent table simply does not add what we actually want.

But the fact is that most ORMs use INSERTs with RETURNING clause. If we return NULL from our trigger, ORM will panic, considering that the line is not added. It is added, but not where the ORM is looking.

There are several ways to get around this:

Do not use ORM for INSERTs
Patch ORM (which is sometimes advised in case of ActiveRecord)
Add another trigger that will remove the string from parent.

The last option is undesirable, since for each operation we will perform three. Nevertheless, it is sometimes inevitable, therefore we will consider it separately:

 CREATE OR REPLACE FUNCTION delete_parent_row() RETURNS TRIGGER AS $BODY$ DECLARE BEGIN delete from only rides where id = NEW.ID; RETURN null; END; $BODY$ LANGUAGE plpgsql;

 CREATE TRIGGER after_insert_row_trigger AFTER INSERT ON rides FOR EACH ROW EXECUTE PROCEDURE delete_parent_row();

The last thing left for us to do is test our solution. To do this, we generate a certain number of lines:

 DO $script$ DECLARE year_start_epoch bigint := extract(epoch from '20170101'::timestamptz at time zone 'UTC'); delta bigint := extract(epoch from '20171231 23:59:59'::timestamptz at time zone 'UTC') - year_start_epoch; tenant varchar; tenants varchar[] := array['tenant_a', 'tenant_b', 'tenant_c', 'tenant_d']; BEGIN FOREACH tenant IN ARRAY tenants LOOP FOR i IN 1..100000 LOOP insert into rides (tenant_id, created_at, ride_id) values (tenant, to_timestamp(random() * delta + year_start_epoch) at time zone 'UTC', i); END LOOP; END LOOP; END $script$;

And let's see how the database behaves:

 explain select * from rides where tenant_id = 'tenant_a' and created_at AT TIME ZONE 'UTC' > '20171102' and created_at AT TIME ZONE 'UTC' <= '20171103';

If everything went as it should, we should see the following result:

  Append (cost=0.00..4803.76 rows=4 width=196) -> Seq Scan on rides (cost=0.00..4795.46 rows=3 width=196) Filter: (((created_at)::timestamp without time zone > '2017-11-02 00:00:00'::timestamp without time zone) AND ((created_at)::timestamp without time zone <= '2017-11-03 00:00:00'::timestamp without time zone) AND ((tenant_id)::text = 'tenant_a'::text)) -> Index Scan using rides_tenant_a_2017_11_tenant_timezone_idx on rides_tenant_a_2017_11 (cost=0.28..8.30 rows=1 width=196) Index Cond: (((tenant_id)::text = 'tenant_a'::text) AND ((created_at)::timestamp without time zone > '2017-11-02 00:00:00'::timestamp without time zone) AND ((created_at)::timestamp without time zone <= '2017-11-03 00:00:00'::timestamp without time zone)) (5 rows)

Despite the fact that each tenant has a hundred thousand rows, we only select from the required data slice. Success!

I hope that this article was interesting for those who were not yet familiar with what partitioning is and how it is implemented in PostgreSQL. And those for whom this topic is no longer new, still learned a couple of interesting tricks.

UPD:
As bigtrot correctly noted, all this street magic will not work if the CONSTRAINT_EXCLUSION setting is turned off.

You can check this with the command

 show CONSTRAINT_EXCLUSION

Setup has three values: on, off and partition

Setting up a partition is more optimal if you suddenly like to use CHECK CONSTRAINTS not only for partitions, but also for normalizing data.

Source: https://habr.com/ru/post/422753/

All Articles

Understanding partitions in PostgreSQL 9

More articles: