Not very big data

This article will look at the features provided by inline or declarative partitioning in PostgreSQL version 12. The demonstration is prepared for the report of the same name at the HighLoad ++ Siberia 2019 conference.

All examples are performed on the recently appeared beta version:

=> SELECT version();

  version ------------------------------------------------------------------------------------------------------------------ PostgreSQL 12beta1 on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 32-bit (1 row)

The examples use the tables bookings and tickets for the demo database. The booking table contains records for three months from June to August 2017 and has the following structure:
')

 => \d bookings

 Table "bookings.bookings" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+--------- book_ref | character(6) | | not null | book_date | timestamp with time zone | | not null | total_amount | numeric(10,2) | | not null | Indexes: "bookings_pkey" PRIMARY KEY, btree (book_ref) Referenced by: TABLE "tickets" CONSTRAINT "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref)

Reservations may include multiple tickets. The structure of the table with tickets:

 => \d tickets

  Table "bookings.tickets" Column | Type | Collation | Nullable | Default ----------------+-----------------------+-----------+----------+--------- ticket_no | character(13) | | not null | book_ref | character(6) | | not null | passenger_id | character varying(20) | | not null | passenger_name | text | | not null | contact_data | jsonb | | | Indexes: "tickets_pkey" PRIMARY KEY, btree (ticket_no) Foreign-key constraints: "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref) Referenced by: TABLE "ticket_flights" CONSTRAINT "ticket_flights_ticket_no_fkey" FOREIGN KEY (ticket_no) REFERENCES tickets(ticket_no)

This information should be sufficient to understand the examples in which we will try to make the tables partitioned.

→ You can learn more about the demo database here.

Range Partitioning

At the beginning, we will try to make the bookings table partitioned by date range. In this case, the table would be created like this:

 => CREATE TABLE bookings_range ( book_ref character(6), book_date timestamptz, total_amount numeric(10,2) ) PARTITION BY RANGE(book_date);

Separate sections for each month:

 => CREATE TABLE bookings_range_201706 PARTITION OF bookings_range FOR VALUES FROM ('2017-06-01'::timestamptz) TO ('2017-07-01'::timestamptz); => CREATE TABLE bookings_range_201707 PARTITION OF bookings_range FOR VALUES FROM ('2017-07-01'::timestamptz) TO ('2017-08-01'::timestamptz);

To specify the boundaries of the section, you can use not only constants, but also expressions, such as a function call. The value of the expression is calculated at the moment of creating the section and stored in the system catalog:

 => CREATE TABLE bookings_range_201708 PARTITION OF bookings_range FOR VALUES FROM (to_timestamp('01.08.2017','DD.MM.YYYY')) TO (to_timestamp('01.09.2017','DD.MM.YYYY'));

Table Description:

 => \d+ bookings_range

 Partitioned table "bookings.bookings_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+--------------------------+-----------+----------+---------+----------+--------------+------------- book_ref | character(6) | | | | extended | | book_date | timestamp with time zone | | | | plain | | total_amount | numeric(10,2) | | | | main | | Partition key: RANGE (book_date) Partitions: bookings_range_201706 FOR VALUES FROM ('2017-06-01 00:00:00+03') TO ('2017-07-01 00:00:00+03'), bookings_range_201707 FOR VALUES FROM ('2017-07-01 00:00:00+03') TO ('2017-08-01 00:00:00+03'), bookings_range_201708 FOR VALUES FROM ('2017-08-01 00:00:00+03') TO ('2017-09-01 00:00:00+03')

That's enough. No trigger is required to insert records, no CHECK restrictions are needed. The CONSTRAINT_EXCLUSION parameter is also not needed, it can even be disabled:

 => SET constraint_exclusion = OFF;

Filling with automatic layout in sections:

 => INSERT INTO bookings_range SELECT * FROM bookings;

 INSERT 0 262788

The declarative syntax still hides inherited tables, so the distribution of rows by sections can be viewed by the query:

 => SELECT tableoid::regclass, count(*) FROM bookings_range GROUP BY tableoid;

  tableoid | count -----------------------+-------- bookings_range_201706 | 7303 bookings_range_201707 | 167062 bookings_range_201708 | 88423 (3 rows)

And in the parent table there is no data:

 => SELECT * FROM ONLY bookings_range;

  book_ref | book_date | total_amount ----------+-----------+-------------- (0 rows)

Let's check section exclusion in terms of query:

 => EXPLAIN (COSTS OFF) SELECT * FROM bookings_range WHERE book_date = '2017-07-01'::timestamptz;

  QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on bookings_range_201707 Filter: (book_date = '2017-07-01 00:00:00+03'::timestamp with time zone) (2 rows)

Scan only one section, as expected.

The following example uses the to_timestamp function with the category of variability STABLE instead of a constant:

 => EXPLAIN (COSTS OFF) SELECT * FROM bookings_range WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');

  QUERY PLAN ------------------------------------------------------------------------------------ Append Subplans Removed: 2 -> Seq Scan on bookings_range_201707 Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text)) (4 rows)

The value of the function is calculated during the initialization of the query plan and part of the sections is excluded from viewing (the Subplans Removed line).

But this only works for SELECT. When changing data, the exclusion of sections based on the values of the STABLE functions is not yet implemented:

 => EXPLAIN (COSTS OFF) DELETE FROM bookings_range WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');

  QUERY PLAN ------------------------------------------------------------------------------------ Delete on bookings_range Delete on bookings_range_201706 Delete on bookings_range_201707 Delete on bookings_range_201708 -> Seq Scan on bookings_range_201706 Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text)) -> Seq Scan on bookings_range_201707 Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text)) -> Seq Scan on bookings_range_201708 Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text)) (10 rows)

Therefore, you should use constants:

 => EXPLAIN (COSTS OFF) DELETE FROM bookings_range WHERE book_date = '2017-07-01'::timestamptz;

  QUERY PLAN ---------------------------------------------------------------------------------- Delete on bookings_range Delete on bookings_range_201707 -> Seq Scan on bookings_range_201707 Filter: (book_date = '2017-07-01 00:00:00+03'::timestamp with time zone) (4 rows)

Sort by index

To perform the following query, it is necessary to sort the results obtained from different sections. Therefore, in the query plan, we see the SORT node and the high initial cost of the plan:

 => EXPLAIN SELECT * FROM bookings_range ORDER BY book_date;

  QUERY PLAN ------------------------------------------------------------------------------------------ Sort (cost=24649.77..25077.15 rows=170952 width=52) Sort Key: bookings_range_201706.book_date -> Append (cost=0.00..4240.28 rows=170952 width=52) -> Seq Scan on bookings_range_201706 (cost=0.00..94.94 rows=4794 width=52) -> Seq Scan on bookings_range_201707 (cost=0.00..2151.30 rows=108630 width=52) -> Seq Scan on bookings_range_201708 (cost=0.00..1139.28 rows=57528 width=52) (6 rows)

Create an index by book_date. Instead of one global index, indexes are created in each section:

 => CREATE INDEX book_date_idx ON bookings_range(book_date);

 => \di bookings_range*

  List of relations Schema | Name | Type | Owner | Table ----------+-------------------------------------+-------+---------+----------------------- bookings | bookings_range_201706_book_date_idx | index | student | bookings_range_201706 bookings | bookings_range_201707_book_date_idx | index | student | bookings_range_201707 bookings | bookings_range_201708_book_date_idx | index | student | bookings_range_201708 (3 rows)

The previous query with sorting can now use the index by partitioning key and return the result from different sections at once in sorted form. The SORT node is not needed and the minimum cost is required to issue the first result line:

 => EXPLAIN SELECT * FROM bookings_range ORDER BY book_date;

  QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Append (cost=1.12..14880.88 rows=262788 width=52) -> Index Scan using bookings_range_201706_book_date_idx on bookings_range_201706 (cost=0.28..385.83 rows=7303 width=52) -> Index Scan using bookings_range_201707_book_date_idx on bookings_range_201707 (cost=0.42..8614.35 rows=167062 width=52) -> Index Scan using bookings_range_201708_book_date_idx on bookings_range_201708 (cost=0.42..4566.76 rows=88423 width=52) (4 rows)

The indexes thus created on the sections are maintained centrally. When you add a new section on it will automatically create an index. And you cannot delete an index of only one section:

 => DROP INDEX bookings_range_201706_book_date_idx;

 ERROR: cannot drop index bookings_range_201706_book_date_idx because index book_date_idx requires it HINT: You can drop index book_date_idx instead.

Only whole:

 => DROP INDEX book_date_idx;

 DROP INDEX

CREATE INDEX ... CONCURRENTLY

When creating an index on a partitioned table, you cannot specify CONCURRENTLY.

But you can do the following. At first we create an index only on the main table, it will receive the invalid status:

 => CREATE INDEX book_date_idx ON ONLY bookings_range(book_date);

 => SELECT indisvalid FROM pg_index WHERE indexrelid::regclass::text = 'book_date_idx';

  indisvalid ------------ f (1 row)

Then we create indexes on all sections with the CONCURRENTLY option:

 => CREATE INDEX CONCURRENTLY book_date_201706_idx ON bookings_range_201706 (book_date); => CREATE INDEX CONCURRENTLY book_date_201707_idx ON bookings_range_201707 (book_date); => CREATE INDEX CONCURRENTLY book_date_201708_idx ON bookings_range_201708 (book_date);

Now we connect local indexes to global:

 => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201706_idx; => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201707_idx; => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201708_idx;

This is similar to connecting table-sections, which we will look at later. As soon as all index sections are connected, the main index will change its status:

 => SELECT indisvalid FROM pg_index WHERE indexrelid::regclass::text = 'book_date_idx';

  indisvalid ------------ t (1 row)

Connecting and disconnecting sections

Automatic creation of sections is not provided. Therefore, they need to be created in advance, before records with new values of the partitioning key begin to be added to the table.

We will create a new section while other transactions are working with the table, at the same time we will look at the locks:

 => BEGIN; => SELECT count(*) FROM bookings_range WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');

  count ------- 5 (1 row)

 => SELECT relation::regclass::text, mode FROM pg_locks WHERE pid = pg_backend_pid() AND relation::regclass::text LIKE 'bookings%';

  relation | mode -----------------------+----------------- bookings_range_201708 | AccessShareLock bookings_range_201707 | AccessShareLock bookings_range_201706 | AccessShareLock bookings_range | AccessShareLock (4 rows)

The AccessShareLock lock is imposed on the main table, all sections and indexes at the beginning of the statement. The calculation of the to_timestamp function and the elimination of sections occurs later. If a constant were used instead of a function, only the main table and the section bookings_range_201707 would be locked. Therefore, if it is possible to specify constants in the request, this should be done, otherwise the number of lines in pg_locks will increase in proportion to the number of sections, which may result in the need to increase max_locks_per_transaction.

Without completing the previous transaction, create the following section for September in a new session:

  || => CREATE TABLE bookings_range_201709 (LIKE bookings_range); || => BEGIN; || => ALTER TABLE bookings_range ATTACH PARTITION bookings_range_201709 FOR VALUES FROM ('2017-09-01'::timestamptz) TO ('2017-10-01'::timestamptz); || => SELECT relation::regclass::text, mode FROM pg_locks WHERE pid = pg_backend_pid() AND relation::regclass::text LIKE 'bookings%';

  relation | mode -------------------------------------+-------------------------- bookings_range_201709_book_date_idx | AccessExclusiveLock bookings_range | ShareUpdateExclusiveLock bookings_range_201709 | ShareLock bookings_range_201709 | AccessExclusiveLock (4 rows)

When creating a new section, a ShareUpdateExclusiveLock lock compatible with AccessShareLock is imposed on the main table. Therefore, the operations of adding sections do not conflict with requests to a partitioned table.

 => COMMIT;

  || => COMMIT;

The sections are disabled by the ALTER TABLE ... DETACH PARTITION command. The section itself is not deleted, but becomes an independent table. From it, you can unload data, you can delete it, and if necessary, reconnect (ATTACH PARTITION).

Another shutdown option is to delete a section with the DROP TABLE command.

Unfortunately, both DROP TABLE and DETACH PARTITION use AccessExclusiveLock lock on the main table.

Default section

If you try to add an entry for which a section has not yet been created, an error will occur. If this behavior is undesirable, you can create a default section:

 => CREATE TABLE bookings_range_default PARTITION OF bookings_range DEFAULT;

Suppose that adding a record confused the date without specifying the millennium:

 => INSERT INTO bookings_range VALUES('XX0000', '0017-09-01'::timestamptz, 0) RETURNING tableoid::regclass, *;

  tableoid | book_ref | book_date | total_amount ------------------------+----------+------------------------------+-------------- bookings_range_default | XX0000 | 0017-09-01 00:00:00+02:30:17 | 0.00 (1 row) INSERT 0 1

Note that the phrase RETURNING returns a new line that falls into the default section.

After setting the current date (changing the partitioning key), the record is automatically moved to the desired section, no triggers are needed:

 => UPDATE bookings_range SET book_date = '2017-09-01'::timestamptz WHERE book_ref = 'XX0000' RETURNING tableoid::regclass, *;

  tableoid | book_ref | book_date | total_amount -----------------------+----------+------------------------+-------------- bookings_range_201709 | XX0000 | 2017-09-01 00:00:00+03 | 0.00 (1 row) UPDATE 1

Listing of values

In the demo database, the primary key of the bookings table should be the column book_ref. However, the selected partitioning scheme does not allow creating such a key:

 => ALTER TABLE bookings_range ADD PRIMARY KEY(book_ref);

 ERROR: insufficient columns in PRIMARY KEY constraint definition DETAIL: PRIMARY KEY constraint on table "bookings_range" lacks column "book_date" which is part of the partition key.

The partition key must be included in the primary key.

To break down by months and still include book_ref in the primary key, let's try a different partitioning scheme for the bookings table - according to the list of values. To do this, add the redundant column book_month as the partitioning key:

 => CREATE TABLE bookings_list ( book_ref character(6), book_month character(6), book_date timestamptz NOT NULL, total_amount numeric(10,2), PRIMARY KEY (book_ref, book_month) ) PARTITION BY LIST(book_month);

We will form sections dynamically based on data from the bookings table:

 => WITH dates AS ( SELECT date_trunc('month',min(book_date)) min_date, date_trunc('month',max(book_date)) max_date FROM bookings ), partition AS ( SELECT to_char(g.month, 'YYYYMM') AS book_month FROM dates, generate_series(dates.min_date, dates.max_date, '1 month'::interval) AS g(month) ) SELECT format('CREATE TABLE %I PARTITION OF bookings_list FOR VALUES IN (%L)', 'bookings_list_' || partition.book_month, partition.book_month) FROM partition\gexec

 CREATE TABLE CREATE TABLE CREATE TABLE

Here's what happened:

 => \d+ bookings_list

  Partitioned table "bookings.bookings_list" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+--------------------------+-----------+----------+---------+----------+--------------+------------- book_ref | character(6) | | not null | | extended | | book_month | character(6) | | not null | | extended | | book_date | timestamp with time zone | | not null | | plain | | total_amount | numeric(10,2) | | | | main | | Partition key: LIST (book_month) Indexes: "bookings_list_pkey" PRIMARY KEY, btree (book_ref, book_month) Partitions: bookings_list_201706 FOR VALUES IN ('201706'), bookings_list_201707 FOR VALUES IN ('201707'), bookings_list_201708 FOR VALUES IN ('201708')

Filling with the layout of the sections:

 => INSERT INTO bookings_list(book_ref,book_month,book_date,total_amount) SELECT book_ref,to_char(book_date, 'YYYYMM'),book_date,total_amount FROM bookings;

 INSERT 0 262788

As a retreat. For automatic filling of book_month, it is tempting to use the new functionality of version 12 - GENERATED ALWAYS columns. But, unfortunately, they cannot be used as a partitioning key. Therefore, the task of filling the month should be solved in other ways.

Integrity constraints such as CHECK and NOT NULL can be created on a partitioned table. As with inheritance, specifying INHERIT / NOINHERIT tells you whether the constraint should be inherited in all partition tables. Default INHERIT:

 => ALTER TABLE bookings_range ALTER COLUMN book_date SET NOT NULL;

 => \d bookings_range

  Partitioned table "bookings.bookings_range" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+--------- book_ref | character(6) | | | book_date | timestamp with time zone | | not null | total_amount | numeric(10,2) | | | Partition key: RANGE (book_date) Indexes: "book_date_idx" btree (book_date) Number of partitions: 5 (Use \d+ to list them.)

 => \d bookings_range_201706

  Table "bookings.bookings_range_201706" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+--------- book_ref | character(6) | | | book_date | timestamp with time zone | | not null | total_amount | numeric(10,2) | | | Partition of: bookings_range FOR VALUES FROM ('2017-06-01 00:00:00+03') TO ('2017-07-01 00:00:00+03') Indexes: "book_date_201706_idx" btree (book_date)

The EXCLUDE constraint can only be created locally on sections.

Searching for book_ref will look in all sections, but by index, thanks to the fact that book_ref is listed first:

 => EXPLAIN (COSTS OFF) SELECT * FROM bookings_list WHERE book_ref = '00000F';

  QUERY PLAN -------------------------------------------------------------------------- Append -> Index Scan using bookings_list_201706_pkey on bookings_list_201706 Index Cond: (book_ref = '00000F'::bpchar) -> Index Scan using bookings_list_201707_pkey on bookings_list_201707 Index Cond: (book_ref = '00000F'::bpchar) -> Index Scan using bookings_list_201708_pkey on bookings_list_201708 Index Cond: (book_ref = '00000F'::bpchar) (7 rows)

Search by book_ref and section range should look only within the specified range:

 => EXPLAIN (COSTS OFF) SELECT * FROM bookings_list WHERE book_ref = '00000F' AND book_month = '201707';

  QUERY PLAN ----------------------------------------------------------------------------------- Index Scan using bookings_list_201707_pkey on bookings_list_201707 Index Cond: ((book_ref = '00000F'::bpchar) AND (book_month = '201707'::bpchar)) (2 rows)

The INSERT ... ON CONFLICT command correctly finds the required section and performs the update:

 => INSERT INTO bookings_list VALUES ('XX0001','201708','2017-08-01',0) RETURNING tableoid::regclass, *;

  tableoid | book_ref | book_month | book_date | total_amount ----------------------+----------+------------+------------------------+-------------- bookings_list_201708 | XX0001 | 201708 | 2017-08-01 00:00:00+03 | 0.00 (1 row) INSERT 0 1

 => INSERT INTO bookings_list VALUES ('XX0001','201708','2017-08-01',100) ON CONFLICT(book_ref,book_month) DO UPDATE SET total_amount = 100 RETURNING tableoid::regclass, *;

  tableoid | book_ref | book_month | book_date | total_amount ----------------------+----------+------------+------------------------+-------------- bookings_list_201708 | XX0001 | 201708 | 2017-08-01 00:00:00+03 | 100.00 (1 row) INSERT 0 1

Foreign keys

In the demo database, the tickets table refers to bookings.

To make a foreign key possible, we add a column book_month, and at the same time we divide into sections by months, like bookings_list.

 => CREATE TABLE tickets_list ( ticket_no character(13), book_month character(6), book_ref character(6) NOT NULL, passenger_id varchar(20) NOT NULL, passenger_name text NOT NULL, contact_data jsonb, PRIMARY KEY (ticket_no, book_month), FOREIGN KEY (book_ref, book_month) REFERENCES bookings_list (book_ref, book_month) ) PARTITION BY LIST (book_month);

The FOREIGN KEY constraint is worth a closer look. On the one hand, this is the foreign key from the partitioned table (tickets_list), and on the other hand it is the key to the partitioned table (bookings_list). Thus, foreign keys for partitioned tables are supported in both directions.

Create sections:

 => WITH dates AS ( SELECT date_trunc('month',min(book_date)) min_date, date_trunc('month',max(book_date)) max_date FROM bookings ), partition AS ( SELECT to_char(g.month, 'YYYYMM') AS book_month FROM dates, generate_series(dates.min_date, dates.max_date, '1 month'::interval) AS g(month) ) SELECT format('CREATE TABLE %I PARTITION OF tickets_list FOR VALUES IN (%L)', 'tickets_list_' || partition.book_month, partition.book_month) FROM partition\gexec

 CREATE TABLE CREATE TABLE CREATE TABLE

 => \d+ tickets_list

  Partitioned table "bookings.tickets_list" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------------+-----------------------+-----------+----------+---------+----------+--------------+------------- ticket_no | character(13) | | not null | | extended | | book_month | character(6) | | not null | | extended | | book_ref | character(6) | | not null | | extended | | passenger_id | character varying(20) | | not null | | extended | | passenger_name | text | | not null | | extended | | contact_data | jsonb | | | | extended | | Partition key: LIST (book_month) Indexes: "tickets_list_pkey" PRIMARY KEY, btree (ticket_no, book_month) Foreign-key constraints: "tickets_list_book_ref_book_month_fkey" FOREIGN KEY (book_ref, book_month) REFERENCES bookings_list(book_ref, book_month) Partitions: tickets_list_201706 FOR VALUES IN ('201706'), tickets_list_201707 FOR VALUES IN ('201707'), tickets_list_201708 FOR VALUES IN ('201708')

Fill in:

 => INSERT INTO tickets_list (ticket_no,book_month,book_ref,passenger_id,passenger_name,contact_data) SELECT t.ticket_no,b.book_month,t.book_ref, t.passenger_id,t.passenger_name,t.contact_data FROM bookings_list b JOIN tickets t ON (b.book_ref = t.book_ref);

 INSERT 0 366733

 => VACUUM ANALYZE tickets_list;

The distribution of lines in sections:

 => SELECT tableoid::regclass, count(*) FROM tickets_list GROUP BY tableoid;

  tableoid | count ---------------------+-------- tickets_list_201706 | 10160 tickets_list_201707 | 232755 tickets_list_201708 | 123818 (3 rows)

Join and Aggregation Requests

Join two tables partitioned in the same way:

 => EXPLAIN (COSTS OFF) SELECT b.* FROM bookings_list b JOIN tickets_list t ON (b.book_ref = t.book_ref and b.book_month = t.book_month);

  QUERY PLAN ---------------------------------------------------------------------------- Hash Join Hash Cond: ((t.book_ref = b.book_ref) AND (t.book_month = b.book_month)) -> Append -> Seq Scan on tickets_list_201706 t -> Seq Scan on tickets_list_201707 t_1 -> Seq Scan on tickets_list_201708 t_2 -> Hash -> Append -> Seq Scan on bookings_list_201706 b -> Seq Scan on bookings_list_201707 b_1 -> Seq Scan on bookings_list_201708 b_2 (11 rows)

Before starting a join in each table, sections are first combined that fall into the query condition.

But it would be possible to first join the sections of both tables corresponding to the months, and then combine the result. This can be achieved by enabling the enable_partitionwise_join parameter:

 => SET enable_partitionwise_join = ON; => EXPLAIN (COSTS OFF) SELECT b.* FROM bookings_list b JOIN tickets_list t ON (b.book_ref = t.book_ref and b.book_month = t.book_month);

  QUERY PLAN ------------------------------------------------------------------------------------------ Append -> Hash Join Hash Cond: ((t.book_ref = b.book_ref) AND (t.book_month = b.book_month)) -> Seq Scan on tickets_list_201706 t -> Hash -> Seq Scan on bookings_list_201706 b -> Hash Join Hash Cond: ((t_1.book_ref = b_1.book_ref) AND (t_1.book_month = b_1.book_month)) -> Seq Scan on tickets_list_201707 t_1 -> Hash -> Seq Scan on bookings_list_201707 b_1 -> Hash Join Hash Cond: ((t_2.book_ref = b_2.book_ref) AND (t_2.book_month = b_2.book_month)) -> Seq Scan on tickets_list_201708 t_2 -> Hash -> Seq Scan on bookings_list_201708 b_2 (16 rows)

Now, the corresponding sections of the two tables are joined first, and then the results of the joins are combined.

Similar situation with aggregation:

 => EXPLAIN (COSTS OFF) SELECT count(*) FROM bookings_list;

  QUERY PLAN ------------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 2 -> Partial Aggregate -> Parallel Append -> Parallel Seq Scan on bookings_list_201707 -> Parallel Seq Scan on bookings_list_201708 -> Parallel Seq Scan on bookings_list_201706 (8 rows)

Note that scanning sections can be performed in parallel. But at first sections are united, only then aggregation begins. Alternatively, you can aggregate in each section, then combine the result:

 => SET enable_partitionwise_aggregate = ON; => EXPLAIN (COSTS OFF) SELECT count(*) FROM bookings_list;

  QUERY PLAN ------------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 2 -> Parallel Append -> Partial Aggregate -> Parallel Seq Scan on bookings_list_201707 -> Partial Aggregate -> Parallel Seq Scan on bookings_list_201708 -> Partial Aggregate -> Parallel Seq Scan on bookings_list_201706 (10 rows)

These features are especially important if part of the sections are external tables. By default, both are disabled, because The corresponding parameters affect the time of building the plan, but can not always be used.

Hash partitioning

The third way to partition a table is hash partitioning.

Creating a table:

 => CREATE TABLE bookings_hash ( book_ref character(6) PRIMARY KEY, book_date timestamptz NOT NULL, total_amount numeric(10,2) ) PARTITION BY HASH(book_ref);

In this embodiment, book_ref, as a partitioning key, can be immediately declared a primary key.

Splitting into three sections:

 => CREATE TABLE bookings_hash_p0 PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 0); => CREATE TABLE bookings_hash_p1 PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 1); => CREATE TABLE bookings_hash_p2 PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Filling with automatic layout in sections:

 => INSERT INTO bookings_hash SELECT * FROM bookings;

 INSERT 0 262788

The distribution of lines in sections occurs evenly:

 => SELECT tableoid::regclass AS partition, count(*) FROM bookings_hash GROUP BY tableoid;

  partition | count ------------------+------- bookings_hash_p0 | 87649 bookings_hash_p1 | 87651 bookings_hash_p2 | 87488 (3 rows)

New command to view partitioned objects:

 => \dP+

  List of partitioned relations Schema | Name | Owner | Type | Table | Total size | Description ----------+--------------------+---------+-------------------+----------------+------------+------------- bookings | bookings_hash | student | partitioned table | | 13 MB | bookings | bookings_list | student | partitioned table | | 15 MB | bookings | bookings_range | student | partitioned table | | 13 MB | bookings | tickets_list | student | partitioned table | | 50 MB | bookings | book_date_idx | student | partitioned index | bookings_range | 5872 kB | bookings | bookings_hash_pkey | student | partitioned index | bookings_hash | 5800 kB | bookings | bookings_list_pkey | student | partitioned index | bookings_list | 8120 kB | bookings | tickets_list_pkey | student | partitioned index | tickets_list | 19 MB | (8 rows)

 => VACUUM ANALYZE bookings_hash;

Subqueries and nested loop join

The elimination of sections at runtime is possible when connecting nested loops.

Distribution of the first 10 bookings in sections:

 => WITH top10 AS ( SELECT tableoid::regclass AS partition, * FROM bookings_hash ORDER BY book_ref LIMIT 10 ) SELECT partition, count(*) FROM top10 GROUP BY 1 ORDER BY 1;

  partition | count ------------------+------- bookings_hash_p0 | 3 bookings_hash_p1 | 3 bookings_hash_p2 | 4 (3 rows)

Let's look at the query execution plan with the connection of the bookings_hash table and the previous subquery:

 => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) WITH top10 AS ( SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 10 ) SELECT bh.* FROM bookings_hash bh JOIN top10 ON bh.book_ref = top10.book_ref;

  QUERY PLAN ----------------------------------------------------------------------------------------------------- Nested Loop (actual rows=10 loops=1) -> Limit (actual rows=10 loops=1) -> Index Only Scan using bookings_pkey on bookings (actual rows=10 loops=1) Heap Fetches: 0 -> Append (actual rows=1 loops=10) -> Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=3) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=1 loops=3) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=1 loops=4) Index Cond: (book_ref = bookings.book_ref) Planning Time: 0.632 ms Execution Time: 0.278 ms (13 rows)

Connection is performed using the nested loops method. The outer loop in a common table expression is executed 10 times. But pay attention to the number of calls to the table-sections (loops). For each value of the external loop's book_ref, only the section is viewed where this value is stored in the bookings_hash table.

Compare with disabled section exclusion:

 => SET enable_partition_pruning TO OFF; => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) WITH top10 AS ( SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 10 ) SELECT bh.* FROM bookings_hash bh JOIN top10 ON bh.book_ref = top10.book_ref;

  QUERY PLAN ------------------------------------------------------------------------------------------------------ Nested Loop (actual rows=10 loops=1) -> Limit (actual rows=10 loops=1) -> Index Only Scan using bookings_pkey on bookings (actual rows=10 loops=1) Heap Fetches: 0 -> Append (actual rows=1 loops=10) -> Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=0 loops=10) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=0 loops=10) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=0 loops=10) Index Cond: (book_ref = bookings.book_ref) Planning Time: 0.886 ms Execution Time: 0.771 ms (13 rows)

 => RESET enable_partition_pruning;

If you reduce the sample to one reservation, then two sections will not be viewed at all:

 => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) WITH top AS ( SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 1 ) SELECT bh.* FROM bookings_hash bh JOIN top ON bh.book_ref = top.book_ref;

  QUERY PLAN --------------------------------------------------------------------------------------------------- Nested Loop (actual rows=1 loops=1) -> Limit (actual rows=1 loops=1) -> Index Only Scan using bookings_pkey on bookings (actual rows=1 loops=1) Heap Fetches: 0 -> Append (actual rows=1 loops=1) -> Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=1) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (never executed) Index Cond: (book_ref = bookings.book_ref) -> Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (never executed) Index Cond: (book_ref = bookings.book_ref) Planning Time: 0.250 ms Execution Time: 0.090 ms (13 rows)

Instead of a subquery, you can use the function returning the set with the category of variability STABLE:

 => CREATE OR REPLACE FUNCTION get_book_ref(top int) RETURNS SETOF bookings AS $$ BEGIN RETURN QUERY EXECUTE 'SELECT * FROM bookings ORDER BY book_ref LIMIT $1' USING top; END;$$ LANGUAGE plpgsql STABLE;

 => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) SELECT * FROM bookings_hash bh JOIN get_book_ref(10) f ON bh.book_ref = f.book_ref;

  QUERY PLAN ----------------------------------------------------------------------------------------------------- Nested Loop (actual rows=10 loops=1) -> Function Scan on get_book_ref f (actual rows=10 loops=1) -> Append (actual rows=1 loops=10) -> Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=3) Index Cond: (book_ref = f.book_ref) -> Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=1 loops=3) Index Cond: (book_ref = f.book_ref) -> Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=1 loops=4) Index Cond: (book_ref = f.book_ref) Planning Time: 0.175 ms Execution Time: 0.843 ms (11 rows)

Results

Summarizing, we can say that the built-in or declarative partitioning in PostgreSQL 12 has received a rich set of features and can be safely recommended to replace partitioning through inheritance.

Source: https://habr.com/ru/post/456716/

All Articles