For a long time, one of the most well-known imperfections of PostgreSQL was the ability to parallelize queries. With the release of version 9.6 this will cease to be a problem. Much work has been done on this issue, and already starting from commit
80558c1 , a parallelization of sequential reading appears, which we will become acquainted with in the course of this article.
First, it should be noted: the development of this functionality was carried out continuously and some parameters changed their names between commits. This article was written after the checkout on June 17, and some features described in this article will be present only in version 9.6 of beta2.
Comparing with release 9.5, new parameters were added to the configuration file. Here they are:
')
- max_parallel_workers_per_gather : the number of workers who can participate in the sequential scanning of the table;
- min_parallel_relation_size : the minimum size of the relation, after which the scheduler will begin to use additional workers;
- parallel_setup_cost : scheduler parameter, which estimates the cost of creating a new worker;
- parallel_tuple_cost : scheduler parameter that estimates the cost of transferring a tuple from one worker to another;
- force_parallel_mode : a parameter useful for testing, strong concurrency, as well as queries in which the scheduler will behave differently.
Let's look at how additional workers can be used to speed up the execution of our requests. Create a test table with an INT field and one hundred million records:
postgres=
PostgreSQL has a
max_parallel_workers_per_gather parameter of 2 by default, in which case two workers will be activated during a sequential scan.
Normal sequential scanning does not carry anything new:
postgres=
In fact, the presence of the
WHERE clause is necessary for parallelization:
postgres=
We can go back to the past action and look at the difference in execution, with
max_parallel_workers_per_gather set to 0:
postgres=
2.5 times longer.
The scheduler does not always consider the parallelization of sequential reading to be the best option. If the request is not selective enough and there are many tuples that need to be transferred from the worker to the worker, he may prefer the “classic” sequential scan:
postgres=
In fact, if we try to force the scheduler to use sequential read parallelization, we will get the worst result:
postgres=
The number of workers can be increased to
max_worker_processes (default: 8). We will restore the value of
parallel_tuple_cost and see what happens if we increase
max_parallel_workers_per_gather to 8:
postgres=
Even considering that PostgreSQL can use up to 8 workers, he used only six. This is due to the fact that Postgres also optimizes the number of workers depending on the size of the table and the
min_parallel_relation_size parameter. The number of workers available to postgres is based on a geometric progression with a denominator of 3 and a
min_parallel_relation_size as a scaling factor. Here is an example. Considering that 8MB is the default parameter:
Size | Worker |
<8Mb | 0 |
<24Mb | one |
<72MB | 2 |
<216Mb | 3 |
<648Mb | four |
<1944Mb | five |
<5822Mb | 6 |
... | ... |
The size of our table is 3548MB, respectively, 6 is the maximum number of available workers.
postgres=
Finally, I will give a brief demonstration of the improvements made with this patch. By launching our query with a growing number of workers, we get the following results:
Size | Worker |
<0 | 24767.848 ms |
<1 | 14855.961 ms |
<2 | 10415.661 ms |
<3 | 8041.187 ms |
<4 | 8090.855 ms |
<5 | 8082.937 ms |
<6 | 8061.939 ms |
You can see that the runtime is significantly improved until it reaches one third of the original value. It is also easy to explain the fact that we see no improvement when using 6 workers instead of 3: the machine on which the tests were performed has 4 processors, so the results are stable after adding 3 additional workers to the original process.
Finally, PostgreSQL 9.6 has entered a new stage of query parallelization, in which parallelization of sequential reads is only the first excellent result. In addition, in version 9.6, aggregation was parallelized, but this is already a topic for another article that will be released in the coming weeks!