Explaining the inexplicable. Part 2

Registration for the PG Day'16 conference is in full swing, and we continue to publish a translation of Hubert Lubaczewski ’s articles on explain and its main components.

Last time I wrote about what the output of explain shows. Now I want to talk more about the different types of "nodes" / operations that you may find in explain plans.

The most basic operation is sequential scanning (Seq Scan).

She looks like this:
')

explain analyze select * from pg_class; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.049 rows=295 loops=1) Total runtime: 0.249 ms (2 rows)

This is the simplest operation possible - PostgreSQL opens the file with the table, reads the lines one by one and returns them to the user or the explain tree node above, for example, LIMIT, like this:

 explain analyze select * from pg_class limit 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.07 rows=2 width=202) (actual time=0.014..0.014 rows=2 loops=1) -> Seq Scan on pg_class (cost=0.00..10.92 rows=292 width=202) (actual time=0.009..0.009 rows=2 loops=1) Total runtime: 0.132 ms (3 rows)

It is important to understand that the order in which rows are returned is not specific. They are not returned "in the order of insertion" or "last updated line - the first", or something else in the same spirit. Parallel samples, updates, deletes, vacuums can change the order of rows at any time.

Seq Scan can filter rows — that is, discard some on return. This happens, for example, when you add a “WHERE" condition:

 explain analyze select * from pg_class where relname ~ 'a'; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on pg_class (cost=0.00..11.65 rows=227 width=202) (actual time=0.030..0.294 rows=229 loops=1) Filter: (relname ~ 'a'::text) Rows Removed by Filter: 66 Total runtime: 0.379 ms (4 rows)

As you can see, now we have the information “Filter:”. And, since I have DBMS version 9.2 or newer, I also received the comment "Rows removed by filter" (“Rows removed by filter").

The next node type is “Index Scan".

This type of scan seems very simple, and most people understand at least one use case:

 explain analyze select * from pg_class where oid = 1247; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Index Scan using pg_class_oid_index on pg_class (cost=0.15..8.17 rows=1 width=202) (actual time=0.007..0.007 rows=1 loops=1) Index Cond: (oid = 1247::oid) Total runtime: 0.077 ms (3 rows)

It's simple - we have an index corresponding to the condition, so PostgreSQL:

opens the index;
in the index, if it finds where (in the data of the table) there may be rows corresponding to the given condition:
- opens the table;
- obtains the string (s) indicated by the index;
if the rows can be returned — that is, if they are visible in the current session — they are returned.

Of course, you might ask: how can a string be invisible? This can happen with deleted rows that are still in the table (have not been cleaned by vacuum). Or with lines that have been updated. Or were inserted, but after the current transaction.

Index Scan is also used when you want to sort some data using the sort order in the index, for example:

 explain analyze select * from pg_class order by oid limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.15..1.67 rows=10 width=206) (actual time=0.017..0.029 rows=10 loops=1) -> Index Scan using pg_class_oid_index on pg_class (cost=0.15..44.53 rows=292 width=206) (actual time=0.014..0.026 rows=10 loops=1) Total runtime: 0.145 ms (3 rows)

There is no condition here, but we can easily add it like this:

 explain analyze select * from pg_class where oid > 1247 order by oid limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.15..4.03 rows=10 width=206) (actual time=0.021..0.035 rows=10 loops=1) -> Index Scan using pg_class_oid_index on pg_class (cost=0.15..37.84 rows=97 width=206) (actual time=0.017..0.031 rows=10 loops=1) Index Cond: (oid > 1247::oid) Total runtime: 0.132 ms (4 rows)

In these cases, the PG finds the starting point in the index (either the first row that is older than 1247, or just the smallest value in the index), and then simply returns the next row / value until the Limit condition is satisfied.

There is a version of Index Scan called “Index Scan Backward", which does the same thing, but is used to scan in descending order:

 explain analyze select * from pg_class where oid < 1247 order by oid desc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.15..4.03 rows=10 width=206) (actual time=0.012..0.026 rows=10 loops=1) -> Index Scan Backward using pg_class_oid_index on pg_class (cost=0.15..37.84 rows=97 width=206) (actual time=0.009..0.022 rows=10 loops=1) Index Cond: (oid < 1247::oid) Total runtime: 0.119 ms (4 rows)

This is the same type of operation: open the index and, for each row referenced by the index, extract data from the table. It just does not happen “from the smallest to the largest”, but “from the biggest to the lesser”.

Another similar operation is “Index Only Scan".

Let's create a simple table:

 create table test (id serial primary key, i int4); CREATE TABLE insert into test (i) select random() * 1000000000 from generate_series(1,100000); INSERT 0 100000 vacuum analyze test; VACUUM

This gives us a table like this:

 select * from test limit 10; id | i ----+----------- 1 | 546119592 2 | 253476978 3 | 235791031 4 | 654694043 5 | 187647296 6 | 709050245 7 | 210316749 8 | 348927354 9 | 120463097 10 | 5611946 (10 rows)

Here I have an index by id:

 \d test Table "public.test" Column | Type | Modifiers --------+---------+--------------------------------------------------- id | integer | not null default nextval('test_id_seq'::regclass) i | integer | Indexes: "test_pkey" PRIMARY KEY, btree (id)

So, if certain conditions are fulfilled (I will talk about this in more detail later), I can get a plan like this:

 explain analyze select id from test order by id asc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.29..0.55 rows=10 width=4) (actual time=0.039..0.042 rows=10 loops=1) -> Index Only Scan using test_pkey on test (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.036..0.038 rows=10 loops=1) Heap Fetches: 0 Total runtime: 0.092 ms (4 rows)

Note the word “Only" in “Index Only Scan".

This means that Postgres realized that I only select data (columns) from the index. And, perhaps, he does not need to check anything in the table files. So it will return data directly from the index.

These scans have become a big change in PostgreSQL 9.2 , as they can work much faster than regular index scans, because they don't need to check anything in the table data.

The difficulty is that in order to work correctly, the Index must contain information that these strings are on pages that have not undergone changes “recently”. That is, to use Index Only Scans, your table should be well cleaned using vacuum. But, with the autovacuum running, this should not be a problem.

The last type of table scan is called Bitmap Index Scan. It looks like this:

 explain analyze select * from test where i < 100000; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test (cost=4.37..39.99 rows=10 width=8) (actual time=0.025..0.110 rows=13 loops=1) Recheck Cond: (i < 100000) -> Bitmap Index Scan on i1 (cost=0.00..4.37 rows=10 width=0) (actual time=0.013..0.013 rows=13 loops=1) Index Cond: (i < 100000) Total runtime: 0.154 ms (5 rows)

(if you read carefully, you noticed that it uses an index, the creation of which I did not speak about before. This is easy to do: create index i1 on test (i); ).

Bitmap Scans always consist of at least two nodes. First (at the bottom level) is Bitmap Index Scan, and then - Bitmap Heap Scan.

How it works?

Suppose your table has 100,000 pages (about 780MB). Bitmap Index Scan will create a bitmap, where each page of your table will correspond to one bit. So, in this case, we get a memory block of 100,000 bits ~ 12.5 KB. All these bits will be set to 0. Then, Bitmap Index Scan will set some bits to 1, depending on which page of the table may contain the row to be returned.

This part does not affect the data in the table at all. After this is done - that is, when all the pages that contain the lines to be returned are “marked” - this bitmap goes up a level to the Bitmap Heap Scan node, which reads them in a more consistent manner.

What is the meaning of such an operation? Conventional Index Scans cause random I / O operations — pages from disk are loaded in random order. And it is slow. At least on rotating discs.

Sequential scanning is faster when you need to get one page, but, on the other hand, you don’t always need all the pages.

Bitmap Index Scans combines both cases: when you need a lot of rows from the table, but not all, and when the rows that you return are not in the same block (which would be justified if I performed the operation “... where id <. .. "). Bitmap scans have another interesting feature - they can combine two operations or two indices, as in this example:

 explain analyze select * from test where i < 5000000 or i > 950000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on test (cost=107.36..630.60 rows=5323 width=8) (actual time=1.023..4.353 rows=5386 loops=1) Recheck Cond: ((i < 5000000) OR (i > 950000000)) -> BitmapOr (cost=107.36..107.36 rows=5349 width=0) (actual time=0.922..0.922 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..12.25 rows=527 width=0) (actual time=0.120..0.120 rows=491 loops=1) Index Cond: (i < 5000000) -> Bitmap Index Scan on i1 (cost=0.00..92.46 rows=4822 width=0) (actual time=0.799..0.799 rows=4895 loops=1) Index Cond: (i > 950000000) Total runtime: 4.765 ms (8 rows)

Here we see two Bitmap Index scans (there may be more), which are then combined (but not in the same way as with the “JOIN" operation in SQL!) Using BitmapOr.

As you remember, the output Bitmap Index Scan is a bitmap (a block of memory with ones and zeros). Having several such bitmaps, you can easily perform logical operations on them: Or, And, or Not.

Here we see that two such bitmaps were combined using the Or operator, and the resulting bitmap was transferred to the Bitmap Heap Scan, which loaded the appropriate rows from the table.

Although both index scans use the same index here, this is not always the case. For example, let's quickly add some more columns:

 alter table test add column j int4 default random() * 1000000000; ALTER TABLE alter table test add column h int4 default random() * 1000000000; ALTER TABLE create index i2 on test (j); CREATE INDEX create index i3 on test (h); CREATE INDEX

But what happens:

 explain analyze select * from test where j < 50000000 and i < 50000000 and h > 950000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on test (cost=280.76..323.61 rows=12 width=16) (actual time=2.295..2.352 rows=11 loops=1) Recheck Cond: ((h > 950000000) AND (j < 50000000) AND (i < 50000000)) -> BitmapAnd (cost=280.76..280.76 rows=12 width=0) (actual time=2.278..2.278 rows=0 loops=1) -> Bitmap Index Scan on i3 (cost=0.00..92.53 rows=4832 width=0) (actual time=0.546..0.546 rows=4938 loops=1) Index Cond: (h > 950000000) -> Bitmap Index Scan on i2 (cost=0.00..93.76 rows=4996 width=0) (actual time=0.783..0.783 rows=5021 loops=1) Index Cond: (j < 50000000) -> Bitmap Index Scan on i1 (cost=0.00..93.96 rows=5022 width=0) (actual time=0.798..0.798 rows=4998 loops=1) Index Cond: (i < 50000000) Total runtime: 2.428 ms (10 rows)

Three Bitmap Index Scan scans, each of which uses its own index, bitmaps are combined using the “and" bit operation, and the result is fed to the Bitmap Heap Scan.

If you are wondering why BitmapAnd shows “Actual rows = 0", the answer is simple: this node does not deal with rows at all (only the page page's bitmap). So it cannot return rows.

That's all for now. These are all possible table scans — the ways you get data from disk. Next time I will talk about combining different data sources and other types of plans.

Source: https://habr.com/ru/post/276973/

All Articles