Explaining the inexplicable

Friends, we are happy to continue to publish interesting materials on the most diverse aspects of working with PostgreSQL. Today's translation opens a whole series of articles by Hubert Lubaczewski , which will certainly interest a wide circle of readers.

One of the first things that the new database administrator hears is “use EXPLAIN”. And at the first attempt he encounters the incomprehensible:
')

QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Sort (cost=146.63..148.65 rows=808 width=138) (actual time=55.009..55.012 rows=71 loops=1) Sort Key: n.nspname, p.proname, (pg_get_function_arguments(p.oid)) Sort Method: quicksort Memory: 43kB -> Hash Join (cost=1.14..107.61 rows=808 width=138) (actual time=42.495..54.854 rows=71 loops=1) Hash Cond: (p.pronamespace = n.oid) -> Seq Scan on pg_proc p (cost=0.00..89.30 rows=808 width=78) (actual time=0.052..53.465 rows=2402 loops=1) Filter: pg_function_is_visible(oid) -> Hash (cost=1.09..1.09 rows=4 width=68) (actual time=0.011..0.011 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on pg_namespace n (cost=0.00..1.09 rows=4 width=68) (actual time=0.005..0.007 rows=4 loops=1) Filter: ((nspname <> 'pg_catalog'::name) AND (nspname <> 'information_schema'::name))

What does it mean?

It is useless to try to immediately understand the above presented explain. Let's start with something simpler. But before that, I would like you to understand one important thing:

PostgreSQL remembers

This means that PostgreSQL stores some meta-information (information about the information). The number of lines, the number of different values, the most common values, and so on. For large tables, this information is based on a random sample, but in general, Postgres really knows a lot about our data.

So let's consider a simple query and explain it:

 $ explain select * from test where i = 1; QUERY PLAN ------------------------------------------------------ Seq Scan on test (cost=0.00..40.00 rows=12 width=4) Filter: (i = 1) (2 rows)

The request is quite simple and, it seems to me, does not require any additional comments.

In explain, the first line and all lines beginning with “->" are operations. The remaining lines are additional information to the operation above.

In our case, there is only one operation — sequential scanning of the test table.

There is also additional information about the filter.

Sequential scanning means that PostgreSQL will “open” data from a table and read it. Theoretically, he can filter (delete) rows, but, in general, he is ready to read and return the entire table as a whole.

Why ready? I'll explain in a minute.

So, the Seqscan line informs us that we are scanning the table in sequential mode. And that the table is called “test" (although this is where one of the biggest problems of explain is hidden - it does not show a diagram, and this was shouted to me more than once).

And what are these numbers in brackets after the operation?

I want to ask you a question. You have this table:

  Table "public.t" Column | Type | Modifiers -------------+---------+------------------------------------------------ id | integer | not null default nextval('t_id_seq'::regclass) some_column | integer | something | text | Indexes: "t_pkey" PRIMARY KEY, btree (id) "q" btree (some_column)

Having the definition of this table and query:

 SELECT * FROM t where some_column = 123;

What do you think is the best way to fulfill this request? Sequentially scan the table or use an index?

If your answer is: of course, use an index, there is an index on this column, so this method will be faster - then I will ask: what about the situation when there is only one row in the table and it contains some_column value equal to 123?

To perform a sequential scan, I need to read just one page of the table (8192 bytes), and I’ll get a row. In order to use an index, I need to read a page from the index, check whether there are rows in the table that match the condition, and then read the page from the table.

In the end - twice as much work!

You could say: “Of course, but we are talking about very small tables, so the speed does not matter.” Good. Then let's imagine a table in which there are 10 billion lines, and each of them has some_column = 123. The index here does not exactly help, but in reality it will seriously worsen the situation.

Of course, if you have a million rows and only one of them has some_column = 123, scanning the indices will be the most correct solution.

Thus, it is impossible to say whether a given query will use an index, and whether it is necessary for it to use an index at all (we are talking about general cases). To understand this, you need more information. And this fact leads us to a simple conclusion: depending on the situation, one method of obtaining data will be better or worse than the other.

PostgreSQL (up to a certain point) checks all possible scenarios. He knows how many lines you have and how many lines (most likely) will fall under the specified criteria, so he can make very smart decisions.

But how are these decisions made? This is what the first set of numbers in explain shows. This is a cost.

Some think that costs are estimated in seconds. This is not true. Their unit of measurement is “retrieving one page in a sequential order”. That is, both time and resource utilization is estimated.

In postgresql.conf, you might notice the following parameters:

 seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 4.0 # same scale as above cpu_tuple_cost = 0.01 # same scale as above cpu_index_tuple_cost = 0.005 # same scale as above cpu_operator_cost = 0.0025 # same scale as above

That is, we can even change the cost of reading a sequential page. These parameters set the costs that PostgreSQL assumes will be required in order to implement different methods of executing the same query.

For example, let's create a simple table of 1000 lines with some texts and an index:

 create table test (id serial primary key, some_text text); CREATE TABLE insert into test (some_text) select 'whatever' from generate_series(1,1000); INSERT 0 1000

We see that the launch of explain with the condition on id produces the following:

 explain select * from test where id = 50; QUERY PLAN ----------------------------------------------------------------------- Index Scan using test_pkey on test (cost=0.28..8.29 rows=1 width=36) Index Cond: (id = 50) (2 rows)

But what if we told the postgres that the index scan cannot be used under any circumstances?

 explain select * from test where id = 50; QUERY PLAN ------------------------------------------------------------------------ Bitmap Heap Scan on test (cost=4.28..8.30 rows=1 width=13) Recheck Cond: (id = 50) -> Bitmap Index Scan on test_pkey (cost=0.00..4.28 rows=1 width=0) Index Cond: (id = 50) (4 rows)

And this, too, let's disable:

 explain select * from test where id = 50; QUERY PLAN ------------------------------------------------------ Seq Scan on test (cost=0.00..18.50 rows=1 width=13) Filter: (id = 50) (2 rows)

OK, and now let's display them next to each other:

 Index Scan using test_pkey on test (cost=0.28..8.29 rows=1 width=36) Bitmap Heap Scan on test (cost=4.28..8.30 rows=1 width=13) Seq Scan on test (cost=0.00..18.50 rows=1 width=13)

By default postgres uses IndexScan. Why? Everything is simple - in this case it is the least expensive way. Costs will be 8.29, while bitmap heap scan (whatever it is) will require 8.30, and for seq scan - 18.5.

OK, but the costs are indicated by two numbers: number..number. What is it and why am I talking only about the second number? If we took the first number into account, then the winner would be seq scan, since this value is zero for it, and 0.28 for indexscan, and even 4.28 for bitmap heap scan.

The cost value is displayed in the range (number ..number), because it shows the cost per line of the beginning of the operation and the cost of getting all the lines (by all we mean the results returned by this operation, not all that are in the table).

What are the initial costs? For seqscan there are none - you just read the page and return the lines. And that's all. But, for example, to sort a dataset, you need to read all the data and really sort it before you consider returning even the first row. This is clearly seen in the following explain:

  QUERY PLAN ------------------------------------------------------------------- Sort (cost=22.88..23.61 rows=292 width=202) Sort Key: relfilenode -> Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (3 rows)

Note that the initial costs for Sort are 22.88, while the total costs will be only 23.61. So, the return of rows from Sort is insignificant in terms of costs, but their sorting is yes.

The following information in explain is “rows". This is an approximate number of rows, which, according to PostgreSQL, this operation can return (it can return less, for example, if there is a LIMIT). This is also very important for some operations - for example, joins (join). Combining two tables with a total of 20 rows can be done in a variety of ways, and, by and large, no matter which one. But when you combine a table of a million rows with a table of a billion rows, the way you doing is very important (i'm talking not about inner join / left join, but rather about hash join, nested loop, merge join - if you don’t understand what this is about, don’t worry, I’ll explain everything a little later).

Of course, this number can be estimated incorrectly - for many reasons. Sometimes it does not matter, and sometimes it does. But we’ll talk about wrong estimates later.

The last piece of information is width. This is a PostgreSQL estimate of how many, on average, a byte is contained in the single line returned by this operation. For example:

 explain select * from pg_class; QUERY PLAN ------------------------------------------------------------- Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (1 row) explain select relname, relkind from pg_class; QUERY PLAN ------------------------------------------------------------ Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=65) (1 row)

As you can see, the limit on the number of fields has modified the width, and, therefore, the amount of data that must pass through the execution of the request.

And now attention, the most important information: explain'y are trees. The top node requires data from the nodes below it.

Let's look at this plan .

It has 5 operations: sort, hash join, seq scan, hash and again seq scan. PostgreSQL performs the top operation — a sort, which in turn performs the next one that is right below it (hash join) and receives data from it. Hash join, in order to return data to the sorting, must run seq scan (by pg_proc) and hash (# 4). And finally, hash, in order to return the data, should run a seq scan by pg_namespace.

It is very important to understand that some operations can return data instantly or, more importantly, gradually. For example, Seq Scan. And some can not. For example, here we see that Hash (# 4) has the same initial costs as its “sub-operation” seq scan for “all lines”. This means that to start a hash operation (in order for it to return at least one line), you need to read all the lines from its suboperations.

The part about gradual return of strings becomes especially important when you start writing functions. Let's look at this function:

 CREATE OR REPLACE FUNCTION public.test() RETURNS SETOF integer LANGUAGE plpgsql AS $function$ declare i int4; begin for i in 1..3 loop return next i; perform pg_sleep(1); end loop; return; end; $function$;

If you do not understand, do not worry. The function returns 3 lines, each of which contains one integer - 1, 2 and 3. It is important that it falls asleep for 1 second after returning each line.

This means that if I do this:

 select * from test();

I have to wait for the results for 3 seconds.

But how long will have to wait for the return in this situation:

 select * from test() limit 1;

Let's get a look:

 \timing Timing is on. select * from test() limit 1; test ------ 1 (1 row) Time: 3005.334 ms

Same 3 seconds. Why? Because PL / pgSQL (and most, if not all, PL / * languages) cannot return partial results. It seems that they can - with the help of “return next" - but in fact all the results are stored in a buffer and returned together when the execution of the function ends.

On the other hand, “normal” operations can usually return partial data. This can be seen if you carry out some kind of banal operation, like a sequential scan, on a difficult table:

 create table t as select i as id, repeat('depesz', 100)::text as payload from generate_series(1,1000000) i;

This table shows that:

 explain analyze select * from t; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..185834.82 rows=10250082 width=36) (actual time=0.015..232.380 rows=1000000 loops=1) Total runtime: 269.666 ms (2 rows) explain analyze select * from t limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.02 rows=1 width=36) (actual time=0.003..0.003 rows=1 loops=1) -> Seq Scan on t (cost=0.00..185834.82 rows=10250082 width=36) (actual time=0.003..0.003 rows=1 loops=1) Total runtime: 0.016 ms (3 rows)

(please, look only at “Total runtime: ..")

As you can see, the sequential scan ended very quickly - as soon as the LIMIT appetite was satisfied by exactly 1 line.

Please note that even costs (which are not the best criteria for comparing queries) show that the top node (seq scan in the first request and limit in the second) has very different values for returning all the rows - 185834.82 against 0.02.

So the first 4 numbers for any operation (two cost estimates, number of lines and width) are approximate. They may or may not be right.

The other 4 numbers that you get when you run the “EXPLAIN ANALYZE query" or "EXPLAIN (ANALYZE on) query" show real results.

The time is still indicated by the range, but this time is real time. This is exactly the time PostgreSQL spent working on this operation (on average, because it could chase the same operation many times). And just like costs, time is represented by a range: the time at which the operation begins and the time to return all data. Let's check this plan:

 $ explain analyze select * from t limit 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..9.33 rows=100 width=608) (actual time=0.008..0.152 rows=100 loops=1) -> Seq Scan on t (cost=0.00..93333.86 rows=999986 width=608) (actual time=0.007..0.133 rows=100 loops=1) Total runtime: 0.181 ms (3 rows)

As you can see, the start time of the operation for Limit is 0.008 (the unit of measure here is milliseconds). This happens because Seq Scan (which Limit called for data acquisition) took 0.007ms to return the first row, and then another 0.001ms was spent processing inside the limit itself.

Then (after returning the first row), limit continued to receive data from Seq Scan until it received 100 rows. Then he stopped the sequential scan (this happened in 0.133 ms after the beginning of the request) and ended up after another 0.019 ms.

The actual number of rows, as the name implies, shows how many rows (on average) this operation returned. And the loop shows how many times this operation has been performed.

In which case the operation will be called more than once? For example, in some cases with join or subqueries. It will be like this plan .

Note that in the third operation there are only two cycles. This means that this seq scan was run twice, returned, on average, 1 line and, on average, it took 0.160ms to complete. So the total time spent on this particular operation: 2 * 0.160ms = 0.32ms (as indicated in the columns exclusive / inclusive on explain.depesz.com).

Very often, poor query performance is due to the fact that it had to execute a loop on some subquery many times. For example, like here .

(Of course, this does not mean that the postgres was to blame for everything, he chose such a plan. Perhaps there were simply no other options or they turned out to be even more costly).

In the example above, it is worth paying attention to the fact that, although the actual time of operation 3 is only 0.003ms, this operation was performed more than 26,000 times, which resulted in almost 79ms of elapsed time.

I think this is where the theoretical information needed to read the explain has been exhausted. Most likely, you still do not understand what operations and other information mean, but at least now you know what numbers mean and what is the difference between explain (which shows costs in abstract units of measurement based on an approximate estimate of random examples) and explain analyze (which shows the actual time, the number of lines and the execution time in units of measure that allow you to compare different queries).

As always, I’m afraid I missed a lot of things that might be important, but didn’t catch my eye, or (worse,) I thought they were “obvious.” If it seems to you that something is missing, please let me know and I will try to fill in the blanks as soon as possible.

But I also want to add that I plan to develop this text in 2-3 more publications, in which I will tell more about:

what are the operations, how they work and what to expect when you see them in the explain output;
What is statistics, how does Pg get it, how to view it and how to get the most out of it.

I hope this article has been helpful to you. Subscribe and follow publications. Translations of the following episodes will soon appear on the blog! As usual, do not hesitate to leave feedback and suggestions. We will include the most interesting ones in the program of the upcoming PG Day'16 Russia! :)

Source: https://habr.com/ru/post/275851/

All Articles

Explaining the inexplicable

More articles: