⬆️ ⬇️

Postgres 9.x Partitioning Using pg_pathman to optimize the insertion and pruning of partitions

Hello! I want to talk about the features of partitioning in the current postgresql 9.x and its improvement with the pg_pathman extension (and here ), created by the guys from Postgres Professional . The article is intended for developers who are familiar with partitioning, who needed to break a large database into postgres, or for those who want to appreciate the complexity of migrating an already partitioned non-postgres database to postgres.



First we will create a database schema, then partition it in two ways (“regular” and pg_pathman), then fill it with data and check how queries on partitioned tables work.



I will also tell you how to embed this remarkable extension into the data schema, already beaten into partitions in the “regular” way.



1. Create a data schema



Our small schema will consist of three tables:

')

• parent table partitioned into three parts,

• a child table referencing it in the same way,

• a regular table of some result results with links to parent records with partition numbers.



The parent and child tables are partitioned by partition_id. In these tables added columns with some data parent_data and child_data, respectively. And also in the child table an indexed field child_index has been added for use in subqueries.





create table public.parent ( parent_id numeric not null, parent_data varchar(100), partition_id numeric not null, primary key (parent_id) ); create table public.child ( child_id numeric not null, parent_id numeric not null, child_data varchar(100), child_index numeric, partition_id numeric not null, primary key (child_id) ); create index ix_child_parent on public.child(parent_id); create index ix_child_index on public.child(child_index); create table public.result ( result_id numeric not null, parent_id numeric not null, partition_id numeric not null ); create index idx_result on public.result(result_id); 


2. We partition the scheme in a “regular” way.



So, if your eye caught the headline and you decided to see this article, then you probably know that there is no built-in partitioning in postgres yet. Instead, it is proposed to use the existing mechanism of inheritance .



Here's what it looks like:



 create table public.parent_1 ( like public.parent including indexes, check(partition_id = 1) ) inherits (public.parent); create table public.parent_2 ( like public.parent including indexes, check(partition_id = 2) ) inherits (public.parent); create table public.parent_3 ( like public.parent including indexes, check(partition_id = 3) ) inherits (public.parent); create table public.child_1 ( like public.child including indexes, check(partition_id = 1)) inherits (public.child); create table public.child_2 ( like public.child including indexes, check(partition_id = 2)) inherits (public.child); create table public.child_3 ( like public.child including indexes, check(partition_id = 3)) inherits (public.child); create or replace function public.trigger_function() returns trigger as $trg_func$ begin execute 'insert into ' || tg_table_name || '_' || new.partition_id ||' values ($1.*)' using new; return null; end;$trg_func$ language plpgsql security definer; create trigger parent_trigger before insert on public.parentfor each row execute procedure public.trigger_function(); create trigger child_trigger before insert on public.childfor each row execute procedure public.trigger_function(); 


We partition our two tables into three parts each; I didn’t do more partitions, otherwise the query plans will turn the article into an endless sheet. Tables are inherited in like mode, including indexes, so that the primary key and indexes are copied to the inherited tables. Local PK will not give global uniqueness, but there will be local uniqueness in partitions and local indices. Also, in the inherited tables, a restriction is created on the partition identifier so that the regular cut-off mechanism works.

The mechanism of triggers and the universal trigger function public.trigger_function is used to spread records by partitions when inserting. The disadvantage of this mechanism is that when inserted into the main table, the number of records inserted returns 0.



3. We partition the scheme using pg_pathman



Partitioning tables with this extension in postgres is done by calling the special function create_range_partitions:



 select public.create_range_partitions(c.oid, 'partition_id', 1, 1, 3) from pg_class c inner join pg_namespace n on c.relnamespace = n.oid and n.nspname = 'public' where c.relname in ('parent', 'child'); 


With this partitioning, all indexes from the source tables are copied to the partition tables, and special restrictions are created for the operation of this extension.



4. Testing methodology



Before showing the results, I will describe the stand and testing methodology.

The bench on which I tested is a VMware virtual machine with 2 gigabytes of RAM. It hosts Red Hat Server 6.7, Postresql 9.6.3 and pg_pathman 1.4.2. In the postgres config, shared_buffers = 512M, effective_cache_size = 1GB, constraint_exclusion = partition.

The insert block and each query were tested as follows:



1. When testing a query, the query itself is first executed so that its data is swapped into the host windows disk cache, nothing is done to insert it. I did not succeed in throwing this cache for the virtual files and leaving it for the rest of the applications, the complete drop of this cache caused wild brakes and did not allow me to measure anything correctly. Therefore, we will not be much oriented during the execution of queries, we will look at the number of data blocks viewed by queries. This number on the usual large system will be proportional to the number of disk reads, and hence the query execution time in the same place.



2. Linux caches are discarded using the echo 3> / proc / sys / vm / drop_caches command.



3. Postgres is restarted in order to reset its caches.



4. A query or insert is executed with the preceding explain statement (analyze, verbose, buffers) to determine the plan, time and other characteristics of their execution.



5. Insert the data



We insert the data in the following script:



 insert into public.parent (parent_id, parent_data, partition_id) select a, gen_random_bytes(25), trunc(random() * 3) + 1 from generate_series(1, 10000000) a; insert into public.child (child_id, parent_id, child_data, child_index, partition_id) select parent_id * 20 + a, parent_id, gen_random_bytes(25), trunc(random() * 100000) + 1, partition_id from public.parent, generate_series(1, 20) a where random() < 0.1; insert into public.result (result_id, parent_id, partition_id) select a, parent_id, partition_id from generate_series(1,100) a, public.parent where random() < 0.001; 


We insert 10 million entries into the parent table, about twice as many in the child.

On one child_index value there will be approximately 200 entries in child. One result_id value — one million parent records each.



After inserting records, analyze all tables.



 analyze public.parent; analyze public.parent_1; analyze public.parent_2; analyze public.parent_3; analyze public.child; analyze public.child_1; analyze public.child_2; analyze public.child_3; analyze public.result; 


So, the insertion into our two tables, partitioned by the standard postgres method, lasted 38:39, and with the help of pg_pathman - 15:10. That is, the insertion speed has increased two and a half times. In pg_pathman in terms of insertion, a special step PartitionFilter appears to redirect the inserted record and the effect is visible. The trigger is no longer needed. Also, the exception of the trigger for redirecting inserted records allows you to know the number of inserted records, as well as when inserting into a regular table. For example, here’s a plot of inserts into the parent table:



 "Insert on parent (cost=0.00..45.00 rows=1000 width=282)" " -> Custom Scan (PartitionFilter) (cost=0.00..45.00 rows=1000 width=282)" " -> Subquery Scan on "*SELECT*" (cost=0.00..45.00 rows=1000 width=282)" " -> Function Scan on generate_series a (cost=0.00..22.50 rows=1000 width=44)" 


6. Implement pg_pathman in an already partitioned scheme



In order to properly compare partitioning performance, the same data is required. To do this, I will make a standardly partitioned scheme, I will test queries using the above method. And then in this scheme I will implement the pg_pathman extension for testing with it.



The extension has functions both for partitioning an empty table and for parallel partitioning a table with data. But there is no mechanism for registering the already partitioned and data-filled table, and this is sad.



We cannot register the main table right away. pg_pathman also uses the inheritance mechanism to designate the associations of partitions with the main table.



And when trying to register the main table, pg_pathman will not detect its limitations on the partitions and will generate an error.



But this problem can be solved in the following way:



1) The first step is to remove all partitions from the main heirs:



 alter table parent_1 no inherit parent; alter table parent_2 no inherit parent; alter table parent_3 no inherit parent; alter table child_1 no inherit child; alter table child_2 no inherit child; alter table child_3 no inherit child; 


2) Then you need to register the main tables in pg_pathman:



 select public.add_to_pathman_config(c.oid, 'partition_id', '1') from pg_class c inner join pg_namespace n on n.oid = c.relnamespace and n.nspname = 'public' where c.relname in ('parent', 'child'); 


3) Register partitions with pg_pathman:



 select public.attach_range_partition(parent.oid, child.oid, partition_id.partition_id::numeric, (partition_id.partition_id + 1)::numeric) from pg_namespace n inner join pg_class parent on n.oid = parent.relnamespace and parent.relname in ('parent', 'child') inner join generate_series(1,3) partition_id on 1=1 inner join pg_class child on n.oid = child.relnamespace and child.relname = parent.relname || '_' || partition_id.partition_id where n.nspname = 'public'; 


4) Now you can delete all triggers and trigger function:



 drop trigger parent_trigger on public.parent; drop trigger child_trigger on public.child; drop function trigger_function(); 


5) At the end you need to check the insert and queries. Everything should work.



7. We test requests



The standard partitioning mechanism works through constraints on partitions, which it compares with the constants specified in the request.



In this article I am using version 1.4.2 of pg_pathman. The extension documentation describes a new special type of query plan step, RuntimeAppend, which is used instead of the standard Append and differs from it by selectively scanning partitions with the necessary key already during the execution of the query. That is, the partitioning of partitions in this case should work dynamically using the partitioning key of each record.



Therefore, we will first look at how queries work with one constant partition key value, then with two constant values ​​and at the end with partition key values ​​from a regular table. Query plans will be with execution statistics. First, the results of the query for the usual partitioning mechanism (postgres) will be given, and then via pg_pathman.



7.1. Query with joining partitioned tables in from



One constant in the partitioning key condition:



 select * from public.parent p, public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id = 3 limit 100; 


postgres:



 "Limit (cost=20.41..31.31 rows=100 width=181) (actual time=2.517..3.093 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=4 read=9" ... "Planning time: 19.035 ms" "Execution time: 3.607 ms" 


Fully
 "Limit (cost=20.41..31.31 rows=100 width=181) (actual time=2.517..3.093 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=4 read=9" " -> Merge Join (cost=20.41..726234.12 rows=6662376 width=181) (actual time=2.516..3.079 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Merge Cond: (p.parent_id = c.parent_id)" " Buffers: shared hit=4 read=9" " -> Merge Append (cost=0.56..184529.87 rows=3332147 width=84) (actual time=1.397..1.414 rows=47 loops=1)" " Sort Key: p.parent_id" " Buffers: shared hit=1 read=4" " -> Index Scan using parent_pkey on public.parent p (cost=0.12..8.14 rows=1 width=282) (actual time=0.006..0.006 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared hit=1" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.43..142869.88 rows=3332146 width=84) (actual time=1.389..1.400 rows=47 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = '3'::numeric)" " Buffers: shared read=4" " -> Materialize (cost=0.57..450113.45 rows=6662376 width=97) (actual time=1.094..1.605 rows=100 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=3 read=5" " -> Merge Append (cost=0.57..433457.51 rows=6662376 width=97) (actual time=1.089..1.590 rows=100 loops=1)" " Sort Key: c.parent_id" " Buffers: shared hit=3 read=5" " -> Index Scan using ix_child_parent on public.child c (cost=0.12..8.14 rows=1 width=346) (actual time=0.004..0.004 rows=0 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Filter: (c.partition_id = '3'::numeric)" " Buffers: shared hit=1" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_1 (cost=0.43..350169.66 rows=6662375 width=97) (actual time=1.083..1.554 rows=100 loops=1)" " Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id" " Filter: (c_1.partition_id = '3'::numeric)" " Buffers: shared hit=2 read=5" "Planning time: 19.035 ms" "Execution time: 3.607 ms" 


pg_pathman:



 "Limit (cost=15.99..25.01 rows=100 width=628) (actual time=0.934..1.111 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=9" ... "Planning time: 16.164 ms" "Executiontime: 1.174 ms" 


Fully
 "Limit (cost=15.99..25.01 rows=100 width=628) (actual time=0.934..1.111 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=9" " -> Merge Join (cost=15.99..601209.25 rows=6662375 width=628) (actual time=0.934..1.101 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Merge Cond: (p.parent_id = c.parent_id)" " Buffers: shared hit=2 read=9" " -> Merge Append (cost=0.44..142832.56 rows=3332146 width=84) (actual time=0.464..0.480 rows=47 loops=1)" " Sort Key: p.parent_id" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.464..0.477 rows=47 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared read=4" " -> Materialize (cost=0.44..366781.74 rows=6662375 width=97) (actual time=0.464..0.583 rows=100 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=5" " -> Merge Append (cost=0.44..350125.80 rows=6662375 width=97) (actual time=0.459..0.563 rows=100 loops=1)" " Sort Key: c.parent_id" " Buffers: shared hit=2 read=5" " -> Index Scan using child_3_parent_id_idx on public.child_3 c (cost=0.43..350125.79 rows=6662375 width=97) (actual time=0.459..0.550 rows=100 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Filter: (c.partition_id = '3'::numeric)" " Buffers: shared hit=2 read=5" "Planning time: 16.164 ms" "Executiontime: 1.174 ms" 


Two constants in the partitioning key condition:



 select * from public.parent p, public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id in (2, 3) limit 100; 


postgres:



 "Limit (cost=44.98..78.34 rows=100 width=181) (actual time=8.167..8.543 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=21" ... "Planning time: 17.618 ms" "Execution time: 8.870 ms" 


Fully
 "Limit (cost=44.98..78.34 rows=100 width=181) (actual time=8.167..8.543 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=21" " -> Merge Join (cost=44.98..2223585.66 rows=6665904 width=181) (actual time=8.165..8.534 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Merge Cond: (p.parent_id = c.parent_id)" " Join Filter: (p.partition_id = c.partition_id)" " Buffers: shared hit=2 read=21" " -> Merge Append (cost=1.01..408142.24 rows=6665413 width=84) (actual time=5.390..5.416 rows=53 loops=1)" " Sort Key: p.parent_id" " Buffers: shared hit=1 read=8" " -> Index Scan using parent_pkey on public.parent p (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared hit=1" " -> Index Scan using parent_2_pkey on public.parent_2 p_1 (cost=0.43..142956.36 rows=3333266 width=84) (actual time=3.594..3.598 rows=28 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p_2 (cost=0.43..142869.88 rows=3332146 width=84) (actual time=1.786..1.798 rows=26 loops=1)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Materialize (cost=1.46..1498856.74 rows=19997711 width=97) (actual time=2.746..3.039 rows=173 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=1 read=13" " -> Merge Append (cost=1.46..1448862.46 rows=19997711 width=97) (actual time=2.736..2.987 rows=173 loops=1)" " Sort Key: c.parent_id" " Buffers: shared hit=1 read=13" " -> Index Scan using ix_child_parent on public.child c (cost=0.12..8.14 rows=1 width=346) (actual time=0.004..0.004 rows=0 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=1" " -> Index Scan using child_1_parent_id_idx on public.child_1 c_1 (cost=0.43..331779.20 rows=6666670 width=97) (actual time=0.559..0.744 rows=74 loops=1)" " Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id" " Buffers: shared read=5" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_2 (cost=0.43..333612.86 rows=6668665 width=97) (actual time=1.022..1.029 rows=51 loops=1)" " Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id" " Buffers: shared read=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_3 (cost=0.43..333513.72 rows=6662375 width=97) (actual time=1.146..1.152 rows=50 loops=1)" " Output: c_3.child_id, c_3.parent_id, c_3.child_data, c_3.child_index, c_3.partition_id" " Buffers: shared read=4" "Planning time: 17.618 ms" "Execution time: 8.870 ms" 


pg_pathman:



 "Limit (cost=32.45..57.22 rows=100 width=628) (actual time=2.457..2.714 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared read=21" ... "Planning time: 18.944 ms" "Executiontime: 2.798 ms" 


Fully
 "Limit (cost=32.45..57.22 rows=100 width=628) (actual time=2.457..2.714 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared read=21" " -> Merge Join (cost=32.45..1651206.45 rows=6665904 width=628) (actual time=2.455..2.705 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Merge Cond: (p.parent_id = c.parent_id)" " Join Filter: (p.partition_id = c.partition_id)" " Buffers: shared read=21" " -> Merge Append (cost=0.87..285722.16 rows=6665412 width=84) (actual time=1.011..1.037 rows=53 loops=1)" " Sort Key: p.parent_id" " Buffers: shared read=8" " -> Index Scan using parent_2_pkey on public.parent_2 p (cost=0.43..142889.60 rows=3333266 width=84) (actual time=0.522..0.528 rows=28 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.486..0.493 rows=26 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Materialize (cost=1.32..1048885.37 rows=19997710 width=97) (actual time=1.437..1.595 rows=173 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared read=13" " -> Merge Append (cost=1.32..998891.10 rows=19997710 width=97) (actual time=1.432..1.561 rows=173 loops=1)" " Sort Key: c.parent_id" " Buffers: shared read=13" " -> Index Scan using child_1_parent_id_idx on public.child_1 c (cost=0.43..331772.15 rows=6666670 width=97) (actual time=0.433..0.510 rows=74 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared read=5" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_1 (cost=0.43..333608.02 rows=6668665 width=97) (actual time=0.561..0.563 rows=51 loops=1)" " Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id" " Buffers: shared read=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_2 (cost=0.43..333510.91 rows=6662375 width=97) (actual time=0.437..0.442 rows=50 loops=1)" " Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id" " Buffers: shared read=4" "Planning time: 18.944 ms" "Executiontime: 2.798 ms" 


Partitioning key from the table:



 select * from public.result r, public.parent p where r.partition_id = p.partition_id and r.parent_id = p.parent_id and r.result_id = 50 limit 100; 


postgres:



 "Limit (cost=199.17..6528.51 rows=100 width=100) (actual time=14.815..65.450 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=651 read=431" ... "Planning time: 13.773 ms" "Execution time: 65.750 ms" 


Fully
 "Limit (cost=199.17..6528.51 rows=100 width=100) (actual time=14.815..65.450 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=651 read=431" " -> Nested Loop (cost=199.17..192800.93 rows=3043 width=100) (actual time=14.813..65.407 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=651 read=431" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=16) (actual time=11.301..17.396 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=49" " Buffers: shared read=87" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=10.438..10.438 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Append (cost=0.00..20.31 rows=4 width=84) (actual time=0.255..0.477 rows=1 loops=100)" " Buffers: shared hit=651 read=344" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..6.77 rows=1 width=84) (actual time=0.252..0.253 rows=1 loops=100)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = r.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=222 read=178" " -> Index Scan using parent_2_pkey on public.parent_2 p_2 (cost=0.43..6.77 rows=1 width=84) (actual time=0.112..0.112 rows=0 loops=99)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Index Cond: (p_2.parent_id = r.parent_id)" " Filter: (r.partition_id = p_2.partition_id)" " Buffers: shared hit=214 read=84" " -> Index Scan using parent_3_pkey on public.parent_3 p_3 (cost=0.43..6.77 rows=1 width=84) (actual time=0.108..0.108 rows=0 loops=99)" " Output: p_3.parent_id, p_3.parent_data, p_3.partition_id" " Index Cond: (p_3.parent_id = r.parent_id)" " Filter: (r.partition_id = p_3.partition_id)" " Buffers: shared hit=215 read=82" "Planning time: 13.773 ms" "Execution time: 65.750 ms" 


pg_pathman:



 "Limit (cost=199.60..2457.44 rows=100 width=298) (actual time=7.354..30.780 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=222 read=265" ... "Planning time: 11.799 ms" "Executiontime: 30.920 ms" 


Fully
 "Limit (cost=199.60..2457.44 rows=100 width=298) (actual time=7.354..30.780 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=222 read=265" " -> Nested Loop (cost=199.60..68905.63 rows=3043 width=298) (actual time=7.352..30.742 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=222 read=265" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=16) (actual time=6.838..10.367 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=49" " Buffers: shared read=87" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=5.997..5.997 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Custom Scan (RuntimeAppend) (cost=0.43..6.77 rows=1 width=84) (actual time=0.194..0.196 rows=1 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Prune by: (r.partition_id = p.partition_id)" " Buffers: shared hit=222 read=178" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..6.77 rows=1 width=84) (actual time=0.193..0.194 rows=1 loops=100)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = r.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=222 read=178" "Planning time: 11.799 ms" "Executiontime: 30.920 ms" 


With the standard partitioning method, clipping partitions with one constant has spread to the associated partitioned table and cut off the wrong parts. But the empty main table was still viewed, since there is no partition constant on it. In the query with two constants, the pruning worked only on the parent table, on the partition key of which the condition is set. The child table connected to it was viewed completely. That is, clipping partitions of connected tables only works when specifying one constant value of the partition key. In the query with partitioning keys from the related table, the cut did not work at all, which is very bad, although it corresponds to the documentation.



The pg_pathman extension removed the empty main tables from consideration, but for the first and second queries did not change the structure of the plans, the number of block reads was almost the same, which is not surprising.



But it seems that pg_pathman got a bug with the record size: in the Merge Join, in the usual version, the record size is considered to be the addition of the size of the records of the joined tables (width = 182 = 84 + 98), while in the extension it is possible to take the maximum sizes of the records of the joined tables. And the maximum sizes are theoretical maximum sizes from the main tables parent and child (width = 628 = 282 + 346). And this despite the fact that these pg_pathman tables exclude from consideration.



Also in the two-constant query, the child table is viewed completely without the main one, the pruning and their magic step RuntimeAppend for some reason does not work.



In a query with partitioning keys from a regular table, the RuntimeAppend step is triggered, the passage only goes through the required partition, and this speeds up the execution of the query. This is not surprising, it is enough to compare the number of reads in the query Buffers: shared hit = 651 read = 431 and Buffers: shared hit = 222 read = 265.



Requests with connections through inner and left jon behave similarly, so we will skip the analysis of their plans.



7.2. Request with subquery in the fields.



One constant in the partitioning key condition:



 select *, (select min(c.child_data) from public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id) from public.parent p where p.partition_id = 3 limit 100; 


postgres:



 "Limit (cost=0.00..2588.75 rows=100 width=116) (actual time=1.939..4.835 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=977 read=18" ... "Planning time: 14.819 ms" "Execution time: 4.935 ms" 


Fully
 "Limit (cost=0.00..2588.75 rows=100 width=116) (actual time=1.939..4.835 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=977 read=18" " -> Result (cost=0.00..86260794.66 rows=3332147 width=116) (actual time=1.936..4.817 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)" " Buffers: shared hit=977 read=18" " -> Append (cost=0.00..89615.83 rows=3332147 width=84) (actual time=0.026..0.130 rows=100 loops=1)" " Buffers: shared read=2" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.002..0.002 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " -> Seq Scan on public.parent_3 p_1 (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.024..0.117 rows=100 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = '3'::numeric)" " Buffers: shared read=2" " SubPlan 1" " -> Aggregate (cost=25.84..25.85 rows=1 width=32) (actual time=0.046..0.046 rows=1 loops=100)" " Output: min((c.child_data)::text)" " Buffers: shared hit=977 read=16" " -> Append (cost=0.00..25.78 rows=25 width=79) (actual time=0.042..0.044 rows=2 loops=100)" " Buffers: shared hit=977 read=16" " -> Seq Scan on public.child c (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)" " Output: c.child_data" " Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))" " -> Index Scan using child_1_parent_id_idx on public.child_1 c_1 (cost=0.43..8.59 rows=8 width=73) (actual time=0.013..0.013 rows=0 loops=100)" " Output: c_1.child_data" " Index Cond: (c_1.parent_id = p.parent_id)" " Filter: (c_1.partition_id = p.partition_id)" " Buffers: shared hit=296 read=4" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_2 (cost=0.43..8.59 rows=8 width=73) (actual time=0.012..0.012 rows=0 loops=100)" " Output: c_2.child_data" " Index Cond: (c_2.parent_id = p.parent_id)" " Filter: (c_2.partition_id = p.partition_id)" " Buffers: shared hit=296 read=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_3 (cost=0.43..8.59 rows=8 width=73) (actual time=0.015..0.016 rows=2 loops=100)" " Output: c_3.child_data" " Index Cond: (c_3.parent_id = p.parent_id)" " Filter: (c_3.partition_id = p.partition_id)" " Buffers: shared hit=385 read=8" "Planning time: 14.819 ms" "Execution time: 4.935 ms" 


pg_pathman:



 "Limit (cost=0.00..865.96 rows=100 width=314) (actual time=1.569..3.879 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=385 read=10" ... "Planning time: 22.729 ms" "Executiontime: 4.006 ms" 


Fully
 "Limit (cost=0.00..865.96 rows=100 width=314) (actual time=1.569..3.879 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=385 read=10" " -> Result (cost=0.00..28854985.31 rows=3332146 width=314) (actual time=1.564..3.858 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)" " Buffers: shared hit=385 read=10" " -> Append (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.403..0.531 rows=100 loops=1)" " Buffers: shared read=2" " -> Seq Scan on public.parent_3 p (cost=0.00..89615.83 rows=3332146 width=84) (actual time=0.401..0.518 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared read=2" " SubPlan 1" " -> Aggregate (cost=8.61..8.62 rows=1 width=32) (actual time=0.031..0.032 rows=1 loops=100)" " Output: min((c.child_data)::text)" " Buffers: shared hit=385 read=8" " -> Custom Scan (RuntimeAppend) (cost=0.43..8.59 rows=8 width=218) (actual time=0.024..0.025 rows=2 loops=100)" " Output: c.child_data" " Prune by: (c.partition_id = p.partition_id)" " Buffers: shared hit=385 read=8" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_1 (cost=0.43..8.59 rows=8 width=73) (actual time=0.023..0.024 rows=2 loops=100)" " Output: c_1.child_data, c_1.partition_id" " Index Cond: (c_1.parent_id = p.parent_id)" " Filter: (c_1.partition_id = p.partition_id)" " Buffers: shared hit=385 read=8" "Planning time: 22.729 ms" "Executiontime: 4.006 ms" 


Two constants in the partitioning key condition:



 select *, (select min(c.child_data) from public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id) from public.parent p where p.partition_id in (2, 3) limit 100; 


postgres:



 "Limit (cost=0.00..2588.75 rows=100 width=116) (actual time=1.420..2.874 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=978 read=15" ... "Planning time: 15.798 ms" "Execution time: 2.988 ms" 


Fully
 "Limit (cost=0.00..2588.75 rows=100 width=116) (actual time=1.420..2.874 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=978 read=15" " -> Result (cost=0.00..172550562.28 rows=6665413 width=116) (actual time=1.418..2.868 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)" " Buffers: shared hit=978 read=15" " -> Append (cost=0.00..179266.65 rows=6665413 width=84) (actual time=0.122..0.170 rows=100 loops=1)" " Buffers: shared read=2" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.002..0.002 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " -> Seq Scan on public.parent_2 p_1 (cost=0.00..89650.83 rows=3333266 width=84) (actual time=0.119..0.153 rows=100 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=2" " -> Seq Scan on public.parent_3 p_2 (cost=0.00..89615.83 rows=3332146 width=84) (never executed)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))" " SubPlan 1" " -> Aggregate (cost=25.84..25.85 rows=1 width=32) (actual time=0.026..0.026 rows=1 loops=100)" " Output: min((c.child_data)::text)" " Buffers: shared hit=978 read=13" " -> Append (cost=0.00..25.78 rows=25 width=79) (actual time=0.018..0.025 rows=2 loops=100)" " Buffers: shared hit=978 read=13" " -> Seq Scan on public.child c (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)" " Output: c.child_data" " Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))" " -> Index Scan using child_1_parent_id_idx on public.child_1 c_1 (cost=0.43..8.59 rows=8 width=73) (actual time=0.006..0.006 rows=0 loops=100)" " Output: c_1.child_data" " Index Cond: (c_1.parent_id = p.parent_id)" " Filter: (c_1.partition_id = p.partition_id)" " Buffers: shared hit=297 read=3" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_2 (cost=0.43..8.59 rows=8 width=73) (actual time=0.011..0.011 rows=2 loops=100)" " Output: c_2.child_data" " Index Cond: (c_2.parent_id = p.parent_id)" " Filter: (c_2.partition_id = p.partition_id)" " Buffers: shared hit=384 read=7" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_3 (cost=0.43..8.59 rows=8 width=73) (actual time=0.007..0.007 rows=0 loops=100)" " Output: c_3.child_data" " Index Cond: (c_3.parent_id = p.parent_id)" " Filter: (c_3.partition_id = p.partition_id)" " Buffers: shared hit=297 read=3" "Planning time: 15.798 ms" "Execution time: 2.988 ms" 


pg_pathman:



 "Limit (cost=0.00..865.96 rows=100 width=314) (actual time=0.672..1.692 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=384 read=9" ... "Planning time: 17.897 ms" "Execution time: 1.774 ms" 


Fully
 "Limit (cost=0.00..865.96 rows=100 width=314) (actual time=0.672..1.692 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, ((SubPlan 1))" " Buffers: shared hit=384 read=9" " -> Result (cost=0.00..57719674.22 rows=6665412 width=314) (actual time=0.670..1.683 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, (SubPlan 1)" " Buffers: shared hit=384 read=9" " -> Append (cost=0.00..179266.65 rows=6665412 width=84) (actual time=0.126..0.270 rows=100 loops=1)" " Buffers: shared read=2" " -> Seq Scan on public.parent_2 p (cost=0.00..89650.83 rows=3333266 width=84) (actual time=0.124..0.257 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=2" " -> Seq Scan on public.parent_3 p_1 (cost=0.00..89615.83 rows=3332146 width=84) (never executed)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " SubPlan 1" " -> Aggregate (cost=8.61..8.62 rows=1 width=32) (actual time=0.013..0.014 rows=1 loops=100)" " Output: min((c.child_data)::text)" " Buffers: shared hit=384 read=7" " -> Custom Scan (RuntimeAppend) (cost=0.43..8.59 rows=8 width=218) (actual time=0.010..0.010 rows=2 loops=100)" " Output: c.child_data" " Prune by: (c.partition_id = p.partition_id)" " Buffers: shared hit=384 read=7" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_1 (cost=0.43..8.59 rows=8 width=73) (actual time=0.009..0.010 rows=2 loops=100)" " Output: c_1.child_data, c_1.partition_id" " Index Cond: (c_1.parent_id = p.parent_id)" " Filter: (c_1.partition_id = p.partition_id)" " Buffers: shared hit=384 read=7" "Planning time: 17.897 ms" "Execution time: 1.774 ms" 


Partitioning key from the table:



 select *, (select min(p.parent_data) from public.parent p where r.partition_id = p.partition_id and r.parent_id = p.parent_id) from public.result r where r.result_id = 50 limit 100; 


postgres:



 "Limit (cost=0.00..2743.60 rows=100 width=48) (actual time=1.751..41.373 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))" " Buffers: shared hit=655 read=409" ... "Planning time: 10.229 ms" "Execution time: 41.462 ms" 


Fully
 "Limit (cost=0.00..2743.60 rows=100 width=48) (actual time=1.751..41.373 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))" " Buffers: shared hit=655 read=409" " -> Seq Scan on public.result r (cost=0.00..250463.09 rows=9129 width=48) (actual time=1.748..41.339 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, (SubPlan 1)" " Filter: (r.result_id = '50'::numeric)" " Rows Removed by Filter: 9709" " Buffers: shared hit=655 read=409" " SubPlan 1" " -> Aggregate (cost=25.36..25.37 rows=1 width=32) (actual time=0.382..0.382 rows=1 loops=100)" " Output: min((p.parent_data)::text)" " Buffers: shared hit=655 read=346" " -> Append (cost=0.00..25.35 rows=4 width=109) (actual time=0.221..0.379 rows=1 loops=100)" " Buffers: shared hit=655 read=346" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=218) (actual time=0.000..0.000 rows=0 loops=100)" " Output: p.parent_data" " Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..8.45 rows=1 width=73) (actual time=0.218..0.219 rows=1 loops=100)" " Output: p_1.parent_data" " Index Cond: (r.parent_id = p_1.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=222 read=178" " -> Index Scan using parent_2_pkey on public.parent_2 p_2 (cost=0.43..8.45 rows=1 width=73) (actual time=0.078..0.078 rows=0 loops=100)" " Output: p_2.parent_data" " Index Cond: (r.parent_id = p_2.parent_id)" " Filter: (r.partition_id = p_2.partition_id)" " Buffers: shared hit=216 read=85" " -> Index Scan using parent_3_pkey on public.parent_3 p_3 (cost=0.43..8.45 rows=1 width=73) (actual time=0.077..0.077 rows=0 loops=100)" " Output: p_3.parent_data" " Index Cond: (r.parent_id = p_3.parent_id)" " Filter: (r.partition_id = p_3.partition_id)" " Buffers: shared hit=217 read=83" "Planning time: 10.229 ms" "Execution time: 41.462 ms" 


pg_pathman:



 "Limit (cost=0.00..1052.85 rows=100 width=48) (actual time=0.946..27.732 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))" " Buffers: shared hit=222 read=241" ... "Planning time: 13.332 ms" "Executiontime: 27.792 ms" 


Fully
 "Limit (cost=0.00..1052.85 rows=100 width=48) (actual time=0.946..27.732 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, ((SubPlan 1))" " Buffers: shared hit=222 read=241" " -> Seq Scan on public.result r (cost=0.00..96114.52 rows=9129 width=48) (actual time=0.944..27.685 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, (SubPlan 1)" " Filter: (r.result_id = '50'::numeric)" " Rows Removed by Filter: 9709" " Buffers: shared hit=222 read=241" " SubPlan 1" " -> Aggregate (cost=8.45..8.46 rows=1 width=32) (actual time=0.245..0.245 rows=1 loops=100)" " Output: min((p.parent_data)::text)" " Buffers: shared hit=222 read=178" " -> Custom Scan (RuntimeAppend) (cost=0.43..8.45 rows=1 width=218) (actual time=0.232..0.233 rows=1 loops=100)" " Output: p.parent_data" " Prune by: (r.partition_id = p.partition_id)" " Buffers: shared hit=222 read=178" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..8.45 rows=1 width=73) (actual time=0.230..0.231 rows=1 loops=100)" " Output: p_1.parent_data, p_1.partition_id" " Index Cond: (r.parent_id = p_1.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=222 read=178" "Planning time: 13.332 ms" "Executiontime: 27.792 ms" 


But everything is more interesting with the subquery: in the standard mode, forwarding the constant to the subquery did not work, and all the partitions get over there. pg_pathman also includes its RuntimeAppend step, and as a result, queries iterate over 400-500 blocks instead of 1000. For a larger number of partitions, the efficiency will increase.



The second query showed one interesting feature of the Append postgres step: it can be seen that Part 3 was not viewed (never executed), that is, the step found the required hundred entries in the second partition, and did not go into the third one.



7.3. Query with join with partitioned tables connected through union



One constant in the partitioning key condition:



 select * from public.parent p, ( select partition_id, parent_id, parent_data from public.parent union all select partition_id, parent_id, child_data from public.child ) u where u.partition_id = p.partition_id and u.parent_id = p.parent_id and p.partition_id = 3 limit 100; 


postgres:



 "Limit (cost=1.72..12.33 rows=100 width=168) (actual time=1.448..1.795 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=9 read=9" ... "Planning time: 10.951 ms" "Execution time: 1.945 ms" 


Fully
 "Limit (cost=1.72..12.33 rows=100 width=168) (actual time=1.448..1.795 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=9 read=9" " -> Merge Join (cost=1.72..1060723.28 rows=9994523 width=168) (actual time=1.447..1.785 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Merge Cond: (p.parent_id = parent.parent_id)" " Buffers: shared hit=9 read=9" " -> Merge Append (cost=0.56..184533.19 rows=3332147 width=84) (actual time=0.660..0.677 rows=32 loops=1)" " Sort Key: p.parent_id" " Buffers: shared hit=1 read=4" " -> Index Scan using parent_pkey on public.parent p (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared hit=1" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.656..0.667 rows=32 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = '3'::numeric)" " Buffers: shared read=4" " -> Materialize (cost=1.15..742928.19 rows=9994523 width=84) (actual time=0.781..1.064 rows=100 loops=1)" " Output: parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=8 read=5" " -> Merge Append (cost=1.15..717941.88 rows=9994523 width=84) (actual time=0.773..1.025 rows=100 loops=1)" " Sort Key: parent.parent_id" " Buffers: shared hit=8 read=5" " -> Index Scan using parent_pkey on public.parent (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)" " Output: parent.partition_id, parent.parent_id, parent.parent_data" " Filter: (parent.partition_id = '3'::numeric)" " Buffers: shared hit=1" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.008..0.023 rows=33 loops=1)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Filter: (parent_3.partition_id = '3'::numeric)" " Buffers: shared hit=4" " -> Index Scan using ix_child_parent on public.child (cost=0.12..8.14 rows=1 width=282) (actual time=0.004..0.004 rows=0 loops=1)" " Output: child.partition_id, child.parent_id, child.child_data" " Filter: (child.partition_id = '3'::numeric)" " Buffers: shared hit=1" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..350175.59 rows=6662375 width=84) (actual time=0.755..0.953 rows=68 loops=1)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Filter: (child_3.partition_id = '3'::numeric)" " Buffers: shared hit=2 read=5" "Planning time: 10.951 ms" "Execution time: 1.945 ms" 


pg_pathman:



 "Limit (cost=1.31..794053.33 rows=2 width=564) (actual time=0.926..1.108 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Buffers: shared hit=6 read=9" ... "Planning time: 13.963 ms" "Executiontime: 1.197 ms" 


Fully
 "Limit (cost=1.31..794053.33 rows=2 width=564) (actual time=0.926..1.108 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Buffers: shared hit=6 read=9" " -> Merge Join (cost=1.31..794053.33 rows=2 width=564) (actual time=0.925..1.097 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Merge Cond: (p.parent_id = parent_3.parent_id)" " Buffers: shared hit=6 read=9" " -> Merge Append (cost=0.44..142832.56 rows=3332146 width=84) (actual time=0.435..0.448 rows=32 loops=1)" " Sort Key: p.parent_id" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.435..0.445 rows=32 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared read=4" " -> Materialize (cost=0.87..517958.89 rows=9994521 width=282) (actual time=0.482..0.609 rows=100 loops=1)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Buffers: shared hit=6 read=5" " -> Merge Append (cost=0.87..492972.59 rows=9994521 width=282) (actual time=0.477..0.586 rows=100 loops=1)" " Sort Key: parent_3.parent_id" " Buffers: shared hit=6 read=5" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.008..0.017 rows=33 loops=1)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Filter: (parent_3.partition_id = '3'::numeric)" " Buffers: shared hit=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..350140.00 rows=6662375 width=84) (actual time=0.469..0.549 rows=68 loops=1)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Filter: (child_3.partition_id = '3'::numeric)" " Buffers: shared hit=2 read=5" "Planning time: 13.963 ms" "Executiontime: 1.197 ms" 


Two constants in the partitioning key condition:



 select * from public.parent p, ( select partition_id, parent_id, parent_data from public.parent union all select partition_id, parent_id, child_data from public.child ) u where u.partition_id = p.partition_id and u.parent_id = p.parent_id and p.partition_id in (2, 3) limit 100; 


postgres:



 "Limit (cost=3.97..2222.49 rows=100 width=168) (actual time=6.006..6.250 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=11 read=24" ... "Planning time: 15.399 ms" "Execution time: 6.391 ms" 


Fully
 "Limit (cost=3.97..2222.49 rows=100 width=168) (actual time=6.006..6.250 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=11 read=24" " -> Merge Join (cost=3.97..3327548.38 rows=149989 width=168) (actual time=6.006..6.235 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Merge Cond: (p.parent_id = parent.parent_id)" " Join Filter: (p.partition_id = parent.partition_id)" " Buffers: shared hit=11 read=24" " -> Merge Append (cost=1.01..408151.48 rows=6665413 width=84) (actual time=1.012..1.029 rows=35 loops=1)" " Sort Key: p.parent_id" " Buffers: shared hit=1 read=8" " -> Index Scan using parent_pkey on public.parent p (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared hit=1" " -> Index Scan using parent_2_pkey on public.parent_2 p_1 (cost=0.43..142962.29 rows=3333266 width=84) (actual time=0.516..0.520 rows=18 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p_2 (cost=0.43..142873.20 rows=3332146 width=84) (actual time=0.491..0.500 rows=18 loops=1)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Filter: (p_2.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Materialize (cost=2.96..2452767.42 rows=29997730 width=84) (actual time=4.978..5.135 rows=173 loops=1)" " Output: parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=10 read=16" " -> Merge Append (cost=2.96..2377773.10 rows=29997730 width=84) (actual time=4.964..5.057 rows=173 loops=1)" " Sort Key: parent.parent_id" " Buffers: shared hit=10 read=16" " -> Index Scan using parent_pkey on public.parent (cost=0.12..8.14 rows=1 width=282) (actual time=0.003..0.003 rows=0 loops=1)" " Output: parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=1" " -> Index Scan using parent_1_pkey on public.parent_1 (cost=0.43..134721.74 rows=3334606 width=84) (actual time=0.411..0.415 rows=25 loops=1)" " Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared read=4" " -> Index Scan using parent_2_pkey on public.parent_2 (cost=0.43..134629.12 rows=3333266 width=84) (actual time=0.005..0.006 rows=19 loops=1)" " Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data" " Buffers: shared hit=4" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..134542.83 rows=3332146 width=84) (actual time=0.005..0.010 rows=18 loops=1)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Buffers: shared hit=4" " -> Index Scan using ix_child_parent on public.child (cost=0.12..8.14 rows=1 width=282) (actual time=0.001..0.001 rows=0 loops=1)" " Output: child.partition_id, child.parent_id, child.child_data" " Buffers: shared hit=1" " -> Index Scan using child_1_parent_id_idx on public.child_1 (cost=0.43..331794.06 rows=6666670 width=84) (actual time=3.310..3.321 rows=50 loops=1)" " Output: child_1.partition_id, child_1.parent_id, child_1.child_data" " Buffers: shared read=4" " -> Index Scan using child_2_parent_id_idx on public.child_2 (cost=0.43..333623.06 rows=6668665 width=84) (actual time=0.553..0.553 rows=29 loops=1)" " Output: child_2.partition_id, child_2.parent_id, child_2.child_data" " Buffers: shared read=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..333519.65 rows=6662375 width=84) (actual time=0.670..0.679 rows=37 loops=1)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Buffers: shared read=4" "Planning time: 15.399 ms" "Execution time: 6.391 ms" 


pg_pathman:



 "Limit (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.679..4.028 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=8 read=24" ... "Planning time: 20.042 ms" "Execution time: 4.162 ms" 


Fully
 "Limit (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.679..4.028 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=8 read=24" " -> Merge Join (cost=3.54..2230075.76 rows=1 width=564) (actual time=3.678..4.014 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Merge Cond: (p.parent_id = parent_1.parent_id)" " Join Filter: (p.partition_id = parent_1.partition_id)" " Buffers: shared hit=8 read=24" " -> Merge Append (cost=0.87..285722.16 rows=6665412 width=84) (actual time=1.279..1.299 rows=35 loops=1)" " Sort Key: p.parent_id" " Buffers: shared read=8" " -> Index Scan using parent_2_pkey on public.parent_2 p (cost=0.43..142889.60 rows=3333266 width=84) (actual time=0.492..0.498 rows=18 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.43..142832.55 rows=3332146 width=84) (actual time=0.786..0.793 rows=18 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Materialize (cost=2.67..1477724.15 rows=29997728 width=282) (actual time=2.381..2.618 rows=173 loops=1)" " Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=8 read=16" " -> Merge Append (cost=2.67..1402729.83 rows=29997728 width=282) (actual time=2.371..2.575 rows=173 loops=1)" " Sort Key: parent_1.parent_id" " Buffers: shared hit=8 read=16" " -> Index Scan using parent_1_pkey on public.parent_1 (cost=0.43..134675.44 rows=3334606 width=84) (actual time=0.482..0.508 rows=25 loops=1)" " Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared read=4" " -> Index Scan using parent_2_pkey on public.parent_2 (cost=0.43..134599.76 rows=3333266 width=84) (actual time=0.007..0.016 rows=19 loops=1)" " Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data" " Buffers: shared hit=4" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..134526.41 rows=3332146 width=84) (actual time=0.006..0.016 rows=18 loops=1)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Buffers: shared hit=4" " -> Index Scan using child_1_parent_id_idx on public.child_1 (cost=0.43..331789.90 rows=6666670 width=84) (actual time=0.473..0.483 rows=50 loops=1)" " Output: child_1.partition_id, child_1.parent_id, child_1.child_data" " Buffers: shared read=4" " -> Index Scan using child_2_parent_id_idx on public.child_2 (cost=0.43..333620.20 rows=6668665 width=84) (actual time=0.716..0.724 rows=29 loops=1)" " Output: child_2.partition_id, child_2.parent_id, child_2.child_data" " Buffers: shared read=4" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..333517.99 rows=6662375 width=84) (actual time=0.679..0.691 rows=37 loops=1)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Buffers: shared read=4" "Planning time: 20.042 ms" "Execution time: 4.162 ms" 


Partitioning key from the table:



 select * from public.result r, ( select partition_id, parent_id, parent_data from public.parent union all select partition_id, parent_id, child_data from public.child ) u where u.partition_id = r.partition_id and u.parent_id = r.parent_id and r.result_id = 50 limit 100; 


postgres:



 "Limit (cost=199.17..27826.77 rows=100 width=100) (actual time=10.168..39.515 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=393 read=271" ... "Planning time: 14.982 ms" "Execution time: 39.747 ms" 


Fully
 "Limit (cost=199.17..27826.77 rows=100 width=100) (actual time=10.168..39.515 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=393 read=271" " -> Nested Loop (cost=199.17..423453.93 rows=1532 width=100) (actual time=10.166..39.486 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent.partition_id, parent.parent_id, parent.parent_data" " Buffers: shared hit=393 read=271" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=16) (actual time=9.257..10.275 rows=31 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=14" " Buffers: shared read=52" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=7.507..7.507 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Append (cost=0.00..45.33 rows=29 width=84) (actual time=0.249..0.939 rows=3 loops=31)" " Buffers: shared hit=393 read=219" " -> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=282) (actual time=0.001..0.001 rows=0 loops=31)" " Output: parent.partition_id, parent.parent_id, parent.parent_data" " Filter: ((r.partition_id = parent.partition_id) AND (r.parent_id = parent.parent_id))" " -> Index Scan using parent_1_pkey on public.parent_1 (cost=0.43..7.19 rows=1 width=84) (actual time=0.244..0.245 rows=1 loops=31)" " Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Index Cond: (parent_1.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_1.partition_id)" " Buffers: shared hit=71 read=53" " -> Index Scan using parent_2_pkey on public.parent_2 (cost=0.43..7.19 rows=1 width=84) (actual time=0.102..0.102 rows=0 loops=31)" " Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data" " Index Cond: (parent_2.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_2.partition_id)" " Buffers: shared hit=68 read=26" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..7.19 rows=1 width=84) (actual time=0.104..0.104 rows=0 loops=31)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Index Cond: (parent_3.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_3.partition_id)" " Buffers: shared hit=67 read=26" " -> Seq Scan on public.child (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=31)" " Output: child.partition_id, child.parent_id, child.child_data" " Filter: ((r.partition_id = child.partition_id) AND (r.parent_id = child.parent_id))" " -> Index Scan using child_1_parent_id_idx on public.child_1 (cost=0.43..7.92 rows=8 width=84) (actual time=0.194..0.194 rows=2 loops=31)" " Output: child_1.partition_id, child_1.parent_id, child_1.child_data" " Index Cond: (child_1.parent_id = r.parent_id)" " Filter: (r.partition_id = child_1.partition_id)" " Buffers: shared hit=64 read=56" " -> Index Scan using child_2_parent_id_idx on public.child_2 (cost=0.43..7.92 rows=8 width=84) (actual time=0.144..0.144 rows=0 loops=30)" " Output: child_2.partition_id, child_2.parent_id, child_2.child_data" " Index Cond: (child_2.parent_id = r.parent_id)" " Filter: (r.partition_id = child_2.partition_id)" " Buffers: shared hit=62 read=28" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..7.92 rows=8 width=84) (actual time=0.145..0.145 rows=0 loops=30)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Index Cond: (child_3.parent_id = r.parent_id)" " Filter: (r.partition_id = child_3.partition_id)" " Buffers: shared hit=61 read=30" "Planning time: 14.982 ms" "Execution time: 39.747 ms" 


pg_pathman:



 "Limit (cost=199.60..411799.35 rows=1 width=298) (actual time=7.118..37.285 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=393 read=271" ... "Planning time: 20.112 ms" "Executiontime: 37.466 ms" 


Fully
 "Limit (cost=199.60..411799.35 rows=1 width=298) (actual time=7.118..37.285 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=393 read=271" " -> Nested Loop (cost=199.60..411799.35 rows=1 width=298) (actual time=7.115..37.259 rows=100 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id, parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Buffers: shared hit=393 read=271" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=16) (actual time=6.511..7.714 rows=31 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=14" " Buffers: shared read=52" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=5.642..5.642 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Append (cost=0.43..44.07 rows=27 width=282) (actual time=0.209..0.950 rows=3 loops=31)" " Buffers: shared hit=393 read=219" " -> Index Scan using parent_1_pkey on public.parent_1 (cost=0.43..6.77 rows=1 width=84) (actual time=0.207..0.208 rows=1 loops=31)" " Output: parent_1.partition_id, parent_1.parent_id, parent_1.parent_data" " Index Cond: (parent_1.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_1.partition_id)" " Buffers: shared hit=71 read=53" " -> Index Scan using parent_2_pkey on public.parent_2 (cost=0.43..6.77 rows=1 width=84) (actual time=0.102..0.102 rows=0 loops=31)" " Output: parent_2.partition_id, parent_2.parent_id, parent_2.parent_data" " Index Cond: (parent_2.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_2.partition_id)" " Buffers: shared hit=68 read=26" " -> Index Scan using parent_3_pkey on public.parent_3 (cost=0.43..6.77 rows=1 width=84) (actual time=0.121..0.121 rows=0 loops=31)" " Output: parent_3.partition_id, parent_3.parent_id, parent_3.parent_data" " Index Cond: (parent_3.parent_id = r.parent_id)" " Filter: (r.partition_id = parent_3.partition_id)" " Buffers: shared hit=67 read=26" " -> Index Scan using child_1_parent_id_idx on public.child_1 (cost=0.43..7.92 rows=8 width=84) (actual time=0.260..0.261 rows=2 loops=31)" " Output: child_1.partition_id, child_1.parent_id, child_1.child_data" " Index Cond: (child_1.parent_id = r.parent_id)" " Filter: (r.partition_id = child_1.partition_id)" " Buffers: shared hit=64 read=56" " -> Index Scan using child_2_parent_id_idx on public.child_2 (cost=0.43..7.92 rows=8 width=84) (actual time=0.125..0.125 rows=0 loops=30)" " Output: child_2.partition_id, child_2.parent_id, child_2.child_data" " Index Cond: (child_2.parent_id = r.parent_id)" " Filter: (r.partition_id = child_2.partition_id)" " Buffers: shared hit=62 read=28" " -> Index Scan using child_3_parent_id_idx on public.child_3 (cost=0.43..7.92 rows=8 width=84) (actual time=0.129..0.129 rows=0 loops=30)" " Output: child_3.partition_id, child_3.parent_id, child_3.child_data" " Index Cond: (child_3.parent_id = r.parent_id)" " Filter: (r.partition_id = child_3.partition_id)" " Buffers: shared hit=61 read=30" "Planning time: 20.112 ms" "Executiontime: 37.466 ms" 


In connection with the union subquery, one constant worked and limited the partitions used, pg_pathman only removed the main tables from consideration, without changing the structure of the plan. In a two-constant query, the constant values ​​only affected the parent table and its partitions; the subquery was not optimized with and without the extension. In the third request, the RuntimeAppend step did not appear and the difference with the standard partitioning is not significant. That is, there is still the possibility of using pg_pathman.



7.4. Request by condition



One constant in the partitioning key condition:



 select * frompublic.parent p where (p.partition_id, p.parent_id) in (select c.partition_id, c.parent_id from public.child c where c.child_index = 5000 and c.partition_id = 3) limit 100; 


postgres:



 "Limit (cost=273.21..858.33 rows=69 width=84) (actual time=5.816..30.351 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" ... "Planning time: 16.899 ms" "Execution time: 30.710 ms" 


Fully
 "Limit (cost=273.21..858.33 rows=69 width=84) (actual time=5.816..30.351 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" " -> Nested Loop (cost=273.21..858.33 rows=69 width=84) (actual time=5.807..30.321 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" " -> HashAggregate (cost=273.21..273.90 rows=69 width=12) (actual time=5.228..5.276 rows=82 loops=1)" " Output: c.partition_id, c.parent_id" " Group Key: c.partition_id, c.parent_id" " Buffers: shared read=85" " -> Append (cost=0.00..272.87 rows=69 width=12) (actual time=1.185..5.143 rows=82 loops=1)" " Buffers: shared read=85" " -> Seq Scan on public.child c (cost=0.00..0.00 rows=1 width=64) (actual time=0.005..0.005 rows=0 loops=1)" " Output: c.partition_id, c.parent_id" " Filter: ((c.child_index = '5000'::numeric) AND (c.partition_id = '3'::numeric))" " -> Bitmap Heap Scan on public.child_3 c_1 (cost=4.96..272.87 rows=68 width=11) (actual time=1.180..5.118 rows=82 loops=1)" " Output: c_1.partition_id, c_1.parent_id" " Recheck Cond: (c_1.child_index = '5000'::numeric)" " Filter: (c_1.partition_id = '3'::numeric)" " Heap Blocks: exact=82" " Buffers: shared read=85" " -> Bitmap Index Scan on child_3_child_index_idx (cost=0.00..4.94 rows=68 width=0) (actual time=1.120..1.120 rows=82 loops=1)" " Index Cond: (c_1.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Append (cost=0.00..8.45 rows=2 width=84) (actual time=0.302..0.304 rows=1 loops=82)" " Buffers: shared hit=134 read=194" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=82)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: ((p.partition_id = '3'::numeric) AND (c.parent_id = p.parent_id))" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.43..8.45 rows=1 width=84) (actual time=0.299..0.300 rows=1 loops=82)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = c.parent_id)" " Filter: (p_1.partition_id = '3'::numeric)" " Buffers: shared hit=134 read=194" "Planning time: 16.899 ms" "Execution time: 30.710 ms" 


pg_pathman:



 "Limit (cost=273.64..849.17 rows=68 width=282) (actual time=11.865..34.016 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" ... "Planning time: 21.265 ms" "Executiontime: 34.098 ms" 


Fully
 "Limit (cost=273.64..849.17 rows=68 width=282) (actual time=11.865..34.016 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" " -> Nested Loop (cost=273.64..849.17 rows=68 width=282) (actual time=11.862..33.985 rows=82 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=134 read=279" " -> HashAggregate (cost=273.21..273.89 rows=68 width=11) (actual time=11.323..11.364 rows=82 loops=1)" " Output: c.partition_id, c.parent_id" " Group Key: c.partition_id, c.parent_id" " Buffers: shared read=85" " -> Append (cost=4.96..272.87 rows=68 width=11) (actual time=1.940..11.150 rows=82 loops=1)" " Buffers: shared read=85" " -> Bitmap Heap Scan on public.child_3 c (cost=4.96..272.87 rows=68 width=11) (actual time=1.938..11.132 rows=82 loops=1)" " Output: c.partition_id, c.parent_id" " Recheck Cond: (c.child_index = '5000'::numeric)" " Filter: (c.partition_id = '3'::numeric)" " Heap Blocks: exact=82" " Buffers: shared read=85" " -> Bitmap Index Scan on child_3_child_index_idx (cost=0.00..4.94 rows=68 width=0) (actual time=1.655..1.655 rows=82 loops=1)" " Index Cond: (c.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Append (cost=0.43..8.45 rows=1 width=84) (actual time=0.274..0.275 rows=1 loops=82)" " Buffers: shared hit=134 read=194" " -> Index Scan using parent_3_pkey on public.parent_3 p (cost=0.43..8.45 rows=1 width=84) (actual time=0.273..0.274 rows=1 loops=82)" " Output: p.parent_id, p.parent_data, p.partition_id" " Index Cond: (p.parent_id = c.parent_id)" " Filter: (p.partition_id = '3'::numeric)" " Buffers: shared hit=134 read=194" "Planning time: 21.265 ms" "Executiontime: 34.098 ms" 


Two constants in the partitioning key condition:



 select * from public.parent p where (p.partition_id, p.parent_id) in (select c.partition_id, c.parent_id from public.child c where c.child_index = 5000 and c.partition_id in (2, 3)) limit 100; 


postgres:



 "Limit (cost=538.59..3078.59 rows=100 width=84) (actual time=20.449..90.114 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=510 read=642" ... "Planning time: 14.024 ms" "Execution time: 90.484 ms" 


Fully
 "Limit (cost=538.59..3078.59 rows=100 width=84) (actual time=20.449..90.114 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=510 read=642" " -> Nested Loop (cost=538.59..3967.59 rows=135 width=84) (actual time=20.446..90.045 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=510 read=642" " -> HashAggregate (cost=538.59..539.94 rows=135 width=11) (actual time=19.387..19.486 rows=100 loops=1)" " Output: c.partition_id, c.parent_id" " Group Key: c.partition_id, c.parent_id" " Buffers: shared read=155" " -> Append (cost=0.00..537.91 rows=135 width=11) (actual time=1.090..19.200 rows=149 loops=1)" " Buffers: shared read=155" " -> Seq Scan on public.child c (cost=0.00..0.00 rows=1 width=64) (actual time=0.005..0.005 rows=0 loops=1)" " Output: c.partition_id, c.parent_id" " Filter: ((c.partition_id = ANY ('{2,3}'::numeric[])) AND (c.child_index = '5000'::numeric))" " -> Bitmap Heap Scan on public.child_2 c_1 (cost=4.94..265.05 rows=66 width=11) (actual time=1.083..5.456 rows=67 loops=1)" " Output: c_1.partition_id, c_1.parent_id" " Recheck Cond: (c_1.child_index = '5000'::numeric)" " Filter: (c_1.partition_id = ANY ('{2,3}'::numeric[]))" " Heap Blocks: exact=67" " Buffers: shared read=70" " -> Bitmap Index Scan on child_2_child_index_idx (cost=0.00..4.93 rows=66 width=0) (actual time=1.003..1.003 rows=67 loops=1)" " Index Cond: (c_1.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Bitmap Heap Scan on public.child_3 c_2 (cost=4.96..272.87 rows=68 width=11) (actual time=1.271..13.668 rows=82 loops=1)" " Output: c_2.partition_id, c_2.parent_id" " Recheck Cond: (c_2.child_index = '5000'::numeric)" " Filter: (c_2.partition_id = ANY ('{2,3}'::numeric[]))" " Heap Blocks: exact=82" " Buffers: shared read=85" " -> Bitmap Index Scan on child_3_child_index_idx (cost=0.00..4.94 rows=68 width=0) (actual time=0.781..0.781 rows=82 loops=1)" " Index Cond: (c_2.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Append (cost=0.00..25.35 rows=4 width=84) (actual time=0.615..0.704 rows=1 loops=100)" " Buffers: shared hit=510 read=487" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: ((c.partition_id = p.partition_id) AND (c.parent_id = p.parent_id))" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..8.45 rows=1 width=84) (actual time=0.193..0.193 rows=0 loops=100)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = c.parent_id)" " Filter: (c.partition_id = p_1.partition_id)" " Buffers: shared hit=171 read=129" " -> Index Scan using parent_2_pkey on public.parent_2 p_2 (cost=0.43..8.45 rows=1 width=84) (actual time=0.225..0.225 rows=0 loops=100)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Index Cond: (p_2.parent_id = c.parent_id)" " Filter: (c.partition_id = p_2.partition_id)" " Buffers: shared hit=170 read=172" " -> Index Scan using parent_3_pkey on public.parent_3 p_3 (cost=0.43..8.45 rows=1 width=84) (actual time=0.280..0.280 rows=1 loops=99)" " Output: p_3.parent_id, p_3.parent_data, p_3.partition_id" " Index Cond: (p_3.parent_id = c.parent_id)" " Filter: (c.partition_id = p_3.partition_id)" " Buffers: shared hit=169 read=186" "Planning time: 14.024 ms" "Execution time: 90.484 ms" 


pg_pathman:



 "Limit (cost=539.01..1385.69 rows=100 width=282) (actual time=17.913..48.317 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=152 read=403" ... "Planning time: 14.037 ms" "Execution time: 48.413 ms" 


Fully
 "Limit (cost=539.01..1385.69 rows=100 width=282) (actual time=17.913..48.317 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=152 read=403" " -> Nested Loop (cost=539.01..1673.56 rows=134 width=282) (actual time=17.911..48.286 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=152 read=403" " -> HashAggregate (cost=538.58..539.92 rows=134 width=11) (actual time=17.359..17.411 rows=100 loops=1)" " Output: c.partition_id, c.parent_id" " Group Key: c.partition_id, c.parent_id" " Buffers: shared read=155" " -> Append (cost=4.94..537.91 rows=134 width=11) (actual time=0.503..17.149 rows=149 loops=1)" " Buffers: shared read=155" " -> Bitmap Heap Scan on public.child_2 c (cost=4.94..265.05 rows=66 width=11) (actual time=0.503..7.763 rows=67 loops=1)" " Output: c.partition_id, c.parent_id" " Recheck Cond: (c.child_index = '5000'::numeric)" " Filter: (c.partition_id = ANY ('{2,3}'::numeric[]))" " Heap Blocks: exact=67" " Buffers: shared read=70" " -> Bitmap Index Scan on child_2_child_index_idx (cost=0.00..4.93 rows=66 width=0) (actual time=0.392..0.392 rows=67 loops=1)" " Index Cond: (c.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Bitmap Heap Scan on public.child_3 c_1 (cost=4.96..272.87 rows=68 width=11) (actual time=0.591..9.346 rows=82 loops=1)" " Output: c_1.partition_id, c_1.parent_id" " Recheck Cond: (c_1.child_index = '5000'::numeric)" " Filter: (c_1.partition_id = ANY ('{2,3}'::numeric[]))" " Heap Blocks: exact=82" " Buffers: shared read=85" " -> Bitmap Index Scan on child_3_child_index_idx (cost=0.00..4.94 rows=68 width=0) (actual time=0.357..0.357 rows=82 loops=1)" " Index Cond: (c_1.child_index = '5000'::numeric)" " Buffers: shared read=3" " -> Custom Scan (RuntimeAppend) (cost=0.43..8.45 rows=1 width=84) (actual time=0.299..0.300 rows=1 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Prune by: (c.partition_id = p.partition_id)" " Buffers: shared hit=152 read=248" " -> Index Scan using parent_2_pkey on public.parent_2 p_1 (cost=0.43..8.45 rows=1 width=84) (actual time=0.293..0.293 rows=1 loops=42)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = c.parent_id)" " Filter: (c.partition_id = p_1.partition_id)" " Buffers: shared hit=62 read=106" " -> Index Scan using parent_3_pkey on public.parent_3 p_2 (cost=0.43..8.45 rows=1 width=84) (actual time=0.301..0.302 rows=1 loops=58)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Index Cond: (p_2.parent_id = c.parent_id)" " Filter: (c.partition_id = p_2.partition_id)" " Buffers: shared hit=90 read=142" "Planning time: 14.037 ms" "Execution time: 48.413 ms" 


Partitioning key from the table:



 select * from public.parent p where (p.partition_id, p.parent_id) in (select r.partition_id, r.parent_id from public.result r where r.result_id = 50) limit 100; 


postgres:



 "Limit (cost=7049.27..9084.40 rows=100 width=84) (actual time=593.094..692.189 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=506 read=5583" ... "Planning time: 8.554 ms" "Execution time: 692.468 ms" 


Fully
 "Limit (cost=7049.27..9084.40 rows=100 width=84) (actual time=593.094..692.189 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=506 read=5583" " -> Nested Loop (cost=7049.27..192836.06 rows=9129 width=84) (actual time=593.091..692.134 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=506 read=5583" " -> HashAggregate (cost=7049.27..7140.51 rows=9124 width=11) (actual time=592.396..592.519 rows=100 loops=1)" " Output: r.partition_id, r.parent_id" " Group Key: r.partition_id, r.parent_id" " Buffers: shared read=5094" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=11) (actual time=10.049..576.423 rows=10051 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=5056" " Buffers: shared read=5094" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=9.087..9.087 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Append (cost=0.00..20.31 rows=4 width=84) (actual time=0.733..0.994 rows=1 loops=100)" " Buffers: shared hit=506 read=489" " -> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=282) (actual time=0.000..0.000 rows=0 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: ((r.partition_id = p.partition_id) AND (r.parent_id = p.parent_id))" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..6.77 rows=1 width=84) (actual time=0.315..0.316 rows=0 loops=100)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = r.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=170 read=158" " -> Index Scan using parent_2_pkey on public.parent_2 p_2 (cost=0.43..6.77 rows=1 width=84) (actual time=0.355..0.358 rows=0 loops=99)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Index Cond: (p_2.parent_id = r.parent_id)" " Filter: (r.partition_id = p_2.partition_id)" " Buffers: shared hit=168 read=170" " -> Index Scan using parent_3_pkey on public.parent_3 p_3 (cost=0.43..6.77 rows=1 width=84) (actual time=0.318..0.319 rows=0 loops=99)" " Output: p_3.parent_id, p_3.parent_data, p_3.partition_id" " Index Cond: (p_3.parent_id = r.parent_id)" " Filter: (r.partition_id = p_3.partition_id)" " Buffers: shared hit=168 read=161" "Planning time: 8.554 ms" "Execution time: 692.468 ms" 


pg_pathman:



 "Limit (cost=7049.70..7728.77 rows=100 width=282) (actual time=531.946..564.520 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=137 read=5358" ... "Planning time: 8.863 ms" "Executiontime: 564.741 ms" 


Fully
 "Limit (cost=7049.70..7728.77 rows=100 width=282) (actual time=531.946..564.520 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=137 read=5358" " -> Nested Loop (cost=7049.70..69008.61 rows=9124 width=282) (actual time=531.943..564.483 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Buffers: shared hit=137 read=5358" " -> HashAggregate (cost=7049.27..7140.51 rows=9124 width=11) (actual time=531.407..531.479 rows=100 loops=1)" " Output: r.partition_id, r.parent_id" " Group Key: r.partition_id, r.parent_id" " Buffers: shared read=5094" " -> Bitmap Heap Scan on public.result r (cost=199.17..7003.62 rows=9129 width=11) (actual time=11.477..518.114 rows=10051 loops=1)" " Output: r.result_id, r.parent_id, r.partition_id" " Recheck Cond: (r.result_id = '50'::numeric)" " Heap Blocks: exact=5056" " Buffers: shared read=5094" " -> Bitmap Index Scan on idx_result (cost=0.00..196.89 rows=9129 width=0) (actual time=10.655..10.655 rows=10051 loops=1)" " Index Cond: (r.result_id = '50'::numeric)" " Buffers: shared read=38" " -> Custom Scan (RuntimeAppend) (cost=0.43..6.77 rows=1 width=84) (actual time=0.320..0.322 rows=1 loops=100)" " Output: p.parent_id, p.parent_data, p.partition_id" " Prune by: (r.partition_id = p.partition_id)" " Buffers: shared hit=137 read=264" " -> Index Scan using parent_1_pkey on public.parent_1 p_1 (cost=0.43..6.77 rows=1 width=84) (actual time=0.329..0.330 rows=1 loops=28)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Index Cond: (p_1.parent_id = r.parent_id)" " Filter: (r.partition_id = p_1.partition_id)" " Buffers: shared hit=37 read=75" " -> Index Scan using parent_2_pkey on public.parent_2 p_2 (cost=0.43..6.77 rows=1 width=84) (actual time=0.313..0.317 rows=1 loops=40)" " Output: p_2.parent_id, p_2.parent_data, p_2.partition_id" " Index Cond: (p_2.parent_id = r.parent_id)" " Filter: (r.partition_id = p_2.partition_id)" " Buffers: shared hit=56 read=105" " -> Index Scan using parent_3_pkey on public.parent_3 p_3 (cost=0.43..6.77 rows=1 width=84) (actual time=0.317..0.317 rows=1 loops=32)" " Output: p_3.parent_id, p_3.parent_data, p_3.partition_id" " Index Cond: (p_3.parent_id = r.parent_id)" " Filter: (r.partition_id = p_3.partition_id)" " Buffers: shared hit=44 read=84" "Planning time: 8.863 ms" "Executiontime: 564.741 ms" 


The first request did not bring surprises, everything, like in the previously reviewed ones. But in the second, an extension worked, which reduced the number of blocks considered by almost two times. Here you can see the classic case of partition pruning: in the RuntimeAppend step, the run for two partitions took place exactly one hundred times (42 + 58), that is, the search was carried out strictly for those partitions that were needed. In the third query, the cut-off works correctly, but the number of reads is not reduced by several times due to the fact that in the results table there are too many results for a specific value. Their reduction to the required hundred is made already after. On the other hand, everything will be fine for a normal database, regular tables should not be large there (otherwise they will be partitioned),and cutting off unnecessary partitions on large tables will result in a significant reduction in the scanned blocks.



The query with the exists condition worked completely the same way, here is its example:



 select * from public.parent p where exists (select 1 from public.child c where p.partition_id = c.partition_id and p.parent_id = c.parent_id and c.child_index = 5000 and c.partition_id in (2, 3)) limit 100; 


7.5. Request for 30 partitions.



I was very upset that RuntimeAppend often does not work in the case of two constants in the condition for the partitioning key and I decided to check the request for more partitions. A schema was created with the same tables and 30 partitions in each with the pg_pathman extension. The data was uploaded there and the following query from 7.1 was tested by the method:



 select * from public.parent p, public.child c where c.partition_id = p.partition_id and c.parent_id = p.parent_id and p.partition_id in (2, 3) limit 100; 


His plan:



 "Limit (cost=17.38..294.92 rows=100 width=628) (actual time=31.483..33.538 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=131" ... "Planning time: 48.911 ms" "Executiontime: 34.143 ms" 


Fully
 "Limit (cost=17.38..294.92 rows=100 width=628) (actual time=31.483..33.538 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=131" " -> Merge Join (cost=17.38..1318082.71 rows=474913 width=628) (actual time=31.482..33.514 rows=100 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id, c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Merge Cond: (p.parent_id = c.parent_id)" " Join Filter: (p.partition_id = c.partition_id)" " Buffers: shared hit=2 read=131" " -> Merge Append (cost=0.85..28624.08 rows=666900 width=84) (actual time=2.625..2.651 rows=48 loops=1)" " Sort Key: p.parent_id" " Buffers: shared read=8" " -> Index Scan using parent_2_pkey on public.parent_2 p (cost=0.42..14289.87 rows=332887 width=84) (actual time=1.362..1.370 rows=28 loops=1)" " Output: p.parent_id, p.parent_data, p.partition_id" " Filter: (p.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Index Scan using parent_3_pkey on public.parent_3 p_1 (cost=0.42..14334.21 rows=334013 width=84) (actual time=1.262..1.272 rows=21 loops=1)" " Output: p_1.parent_id, p_1.parent_data, p_1.partition_id" " Filter: (p_1.partition_id = ANY ('{2,3}'::numeric[]))" " Buffers: shared read=4" " -> Materialize (cost=13.49..1059692.47 rows=20004382 width=98) (actual time=28.789..30.453 rows=1369 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared hit=2 read=123" " -> Merge Append (cost=13.49..1009681.51 rows=20004382 width=98) (actual time=28.782..30.170 rows=1369 loops=1)" " Sort Key: c.parent_id" " Buffers: shared hit=2 read=123" " -> Index Scan using child_1_parent_id_idx on public.child_1 c (cost=0.42..32766.35 rows=664299 width=97) (actual time=0.895..0.909 rows=47 loops=1)" " Output: c.child_id, c.parent_id, c.child_data, c.child_index, c.partition_id" " Buffers: shared read=4" " -> Index Scan using child_2_parent_id_idx on public.child_2 c_1 (cost=0.42..33032.19 rows=664696 width=97) (actual time=0.835..1.048 rows=62 loops=1)" " Output: c_1.child_id, c_1.parent_id, c_1.child_data, c_1.child_index, c_1.partition_id" " Buffers: shared read=5" " -> Index Scan using child_3_parent_id_idx on public.child_3 c_2 (cost=0.42..33625.31 rows=668577 width=98) (actual time=0.849..0.858 rows=39 loops=1)" " Output: c_2.child_id, c_2.parent_id, c_2.child_data, c_2.child_index, c_2.partition_id" " Buffers: shared read=4" " -> Index Scan using child_4_parent_id_idx on public.child_4 c_3 (cost=0.42..32859.69 rows=668831 width=98) (actual time=0.746..0.772 rows=58 loops=1)" " Output: c_3.child_id, c_3.parent_id, c_3.child_data, c_3.child_index, c_3.partition_id" " Buffers: shared read=4" " -> Index Scan using child_5_parent_id_idx on public.child_5 c_4 (cost=0.42..34070.24 rows=667684 width=98) (actual time=1.037..1.043 rows=44 loops=1)" " Output: c_4.child_id, c_4.parent_id, c_4.child_data, c_4.child_index, c_4.partition_id" " Buffers: shared read=4" " -> Index Scan using child_6_parent_id_idx on public.child_6 c_5 (cost=0.42..34413.73 rows=668521 width=98) (actual time=1.248..1.255 rows=63 loops=1)" " Output: c_5.child_id, c_5.parent_id, c_5.child_data, c_5.child_index, c_5.partition_id" " Buffers: shared read=4" " -> Index Scan using child_7_parent_id_idx on public.child_7 c_6 (cost=0.42..33858.17 rows=664992 width=97) (actual time=1.084..1.089 rows=31 loops=1)" " Output: c_6.child_id, c_6.parent_id, c_6.child_data, c_6.child_index, c_6.partition_id" " Buffers: shared read=4" " -> Index Scan using child_8_parent_id_idx on public.child_8 c_7 (cost=0.42..33396.85 rows=666651 width=98) (actual time=0.674..0.804 rows=64 loops=1)" " Output: c_7.child_id, c_7.parent_id, c_7.child_data, c_7.child_index, c_7.partition_id" " Buffers: shared read=5" " -> Index Scan using child_9_parent_id_idx on public.child_9 c_8 (cost=0.42..33535.73 rows=665594 width=98) (actual time=1.142..1.148 rows=53 loops=1)" " Output: c_8.child_id, c_8.parent_id, c_8.child_data, c_8.child_index, c_8.partition_id" " Buffers: shared read=4" " -> Index Scan using child_10_parent_id_idx on public.child_10 c_9 (cost=0.42..34376.76 rows=667885 width=97) (actual time=1.112..1.117 rows=36 loops=1)" " Output: c_9.child_id, c_9.parent_id, c_9.child_data, c_9.child_index, c_9.partition_id" " Buffers: shared read=4" " -> Index Scan using child_11_parent_id_idx on public.child_11 c_10 (cost=0.42..34520.23 rows=667304 width=98) (actual time=0.875..0.883 rows=43 loops=1)" " Output: c_10.child_id, c_10.parent_id, c_10.child_data, c_10.child_index, c_10.partition_id" " Buffers: shared read=4" " -> Index Scan using child_12_parent_id_idx on public.child_12 c_11 (cost=0.42..34516.73 rows=667192 width=97) (actual time=0.725..0.731 rows=37 loops=1)" " Output: c_11.child_id, c_11.parent_id, c_11.child_data, c_11.child_index, c_11.partition_id" " Buffers: shared read=4" " -> Index Scan using child_13_parent_id_idx on public.child_13 c_12 (cost=0.42..34775.47 rows=665684 width=98) (actual time=0.799..0.807 rows=46 loops=1)" " Output: c_12.child_id, c_12.parent_id, c_12.child_data, c_12.child_index, c_12.partition_id" " Buffers: shared read=4" " -> Index Scan using child_14_parent_id_idx on public.child_14 c_13 (cost=0.42..32759.51 rows=667829 width=97) (actual time=0.952..0.966 rows=51 loops=1)" " Output: c_13.child_id, c_13.parent_id, c_13.child_data, c_13.child_index, c_13.partition_id" " Buffers: shared read=4" " -> Index Scan using child_15_parent_id_idx on public.child_15 c_14 (cost=0.42..33834.34 rows=666471 width=97) (actual time=0.713..0.719 rows=30 loops=1)" " Output: c_14.child_id, c_14.parent_id, c_14.child_data, c_14.child_index, c_14.partition_id" " Buffers: shared read=4" " -> Index Scan using child_16_parent_id_idx on public.child_16 c_15 (cost=0.42..33276.75 rows=665648 width=98) (actual time=0.887..0.892 rows=40 loops=1)" " Output: c_15.child_id, c_15.parent_id, c_15.child_data, c_15.child_index, c_15.partition_id" " Buffers: shared read=4" " -> Index Scan using child_17_parent_id_idx on public.child_17 c_16 (cost=0.42..34426.11 rows=666740 width=98) (actual time=0.820..0.835 rows=50 loops=1)" " Output: c_16.child_id, c_16.parent_id, c_16.child_data, c_16.child_index, c_16.partition_id" " Buffers: shared read=4" " -> Index Scan using child_18_parent_id_idx on public.child_18 c_17 (cost=0.42..31990.71 rows=665008 width=98) (actual time=0.780..0.794 rows=52 loops=1)" " Output: c_17.child_id, c_17.parent_id, c_17.child_data, c_17.child_index, c_17.partition_id" " Buffers: shared read=4" " -> Index Scan using child_19_parent_id_idx on public.child_19 c_18 (cost=0.42..34141.36 rows=669231 width=97) (actual time=0.782..0.792 rows=42 loops=1)" " Output: c_18.child_id, c_18.parent_id, c_18.child_data, c_18.child_index, c_18.partition_id" " Buffers: shared read=4" " -> Index Scan using child_20_parent_id_idx on public.child_20 c_19 (cost=0.42..33065.75 rows=666164 width=98) (actual time=1.134..1.143 rows=44 loops=1)" " Output: c_19.child_id, c_19.parent_id, c_19.child_data, c_19.child_index, c_19.partition_id" " Buffers: shared read=4" " -> Index Scan using child_21_parent_id_idx on public.child_21 c_20 (cost=0.42..34236.36 rows=667822 width=98) (actual time=1.122..1.131 rows=53 loops=1)" " Output: c_20.child_id, c_20.parent_id, c_20.child_data, c_20.child_index, c_20.partition_id" " Buffers: shared read=4" " -> Index Scan using child_22_parent_id_idx on public.child_22 c_21 (cost=0.42..33809.84 rows=664828 width=97) (actual time=1.334..1.344 rows=52 loops=1)" " Output: c_21.child_id, c_21.parent_id, c_21.child_data, c_21.child_index, c_21.partition_id" " Buffers: shared read=4" " -> Index Scan using child_23_parent_id_idx on public.child_23 c_22 (cost=0.42..34169.35 rows=668321 width=98) (actual time=1.478..1.482 rows=30 loops=1)" " Output: c_22.child_id, c_22.parent_id, c_22.child_data, c_22.child_index, c_22.partition_id" " Buffers: shared read=4" " -> Index Scan using child_24_parent_id_idx on public.child_24 c_23 (cost=0.42..31693.94 rows=669014 width=98) (actual time=0.936..0.940 rows=30 loops=1)" " Output: c_23.child_id, c_23.parent_id, c_23.child_data, c_23.child_index, c_23.partition_id" " Buffers: shared read=4" " -> Index Scan using child_25_parent_id_idx on public.child_25 c_24 (cost=0.42..34224.95 rows=666004 width=98) (actual time=0.869..0.881 rows=60 loops=1)" " Output: c_24.child_id, c_24.parent_id, c_24.child_data, c_24.child_index, c_24.partition_id" " Buffers: shared read=4" " -> Index Scan using child_26_parent_id_idx on public.child_26 c_25 (cost=0.42..32132.54 rows=665401 width=97) (actual time=1.129..1.138 rows=53 loops=1)" " Output: c_25.child_id, c_25.parent_id, c_25.child_data, c_25.child_index, c_25.partition_id" " Buffers: shared read=4" " -> Index Scan using child_27_parent_id_idx on public.child_27 c_26 (cost=0.42..34930.21 rows=668930 width=97) (actual time=1.307..1.312 rows=42 loops=1)" " Output: c_26.child_id, c_26.parent_id, c_26.child_data, c_26.child_index, c_26.partition_id" " Buffers: shared read=4" " -> Index Scan using child_28_parent_id_idx on public.child_28 c_27 (cost=0.42..33712.49 rows=666743 width=98) (actual time=0.782..0.793 rows=39 loops=1)" " Output: c_27.child_id, c_27.parent_id, c_27.child_data, c_27.child_index, c_27.partition_id" " Buffers: shared read=4" " -> Index Scan using child_29_parent_id_idx on public.child_29 c_28 (cost=0.42..34281.07 rows=665822 width=98) (actual time=0.879..0.886 rows=42 loops=1)" " Output: c_28.child_id, c_28.parent_id, c_28.child_data, c_28.child_index, c_28.partition_id" " Buffers: shared read=4" " -> Index Scan using child_30_parent_id_idx on public.child_30 c_29 (cost=0.42..33248.03 rows=666496 width=98) (actual time=0.816..0.934 rows=65 loops=1)" " Output: c_29.child_id, c_29.parent_id, c_29.child_data, c_29.child_index, c_29.partition_id" " Buffers: shared hit=2 read=5" "Planning time: 48.911 ms" "Executiontime: 34.143 ms" 


The staff cut-off worked for parent, unfortunately the pg_pathman extension did not work on the child table, which from my point of view is wrong.



8. Conclusions



1) Although the regular partitioning of tables is quite shorter, it can cut partitions for queries with one constant partition key value, but not for subqueries in select. For two or more partitioning key constants in the query, the cut-off only works for the table on which this condition is set. That is, this partitioning is suitable only for simple cases of partitioning, or for complex ones, but then you have to greatly optimize the queries, the ways of their execution, break them into pieces, that is, greatly change the interaction with the database.



2) The insert with the pg_pathman extension accelerates approximately two to three times. With an increase in the number of partitions, I think the situation will not change.



3) Requests in the considered examples due to dynamic clipping of partitions reduce the number of required blocks by two times, but with an increase in the number of partitions to the usual for a large database (100-1000), the difference in the required resources will be much larger.



4) The pg_pathman extension, when the RuntimeAppend step is triggered, seriously speeds up queries and inserts and uses it to partition NADO tables.



5) But, there is a problem with the fact that the RuntimeAppend step does not always work and the queries work as before. That is, while there are problems with scheduling requests. According to experience in Oracle, this step is always appropriate when there is information about the necessary partition. But here, while often it does not work, maybe the fact is that there are still errors in the calculation of the cost of the query plans. Unfortunately, while in the postgrese there is no trace of the choice of the query plan and the exact reason I can not establish.



6) I will draw the attention of the developers of this extension to the found bug with the size of the record and a strange choice of plans. Since they fix bugs regularly, the hope of fixing these problems remains.



In conclusion, I want to tell the developers of this extension many thanks for their work and wish them continued success!

Source: https://habr.com/ru/post/335644/



All Articles