PostgreSQL indexes - 1

Foreword

This series of articles discusses PostgreSQL indexes.

Any question can be viewed from different points of view. We will talk about what should be of interest to an application developer using a DBMS: which indexes exist, why there are so many different PostgreSQL, and how to use them to speed up queries. Perhaps the topic could have been revealed with a smaller number of words, but we secretly hope for an inquisitive developer who is also interested in the details of the internal structure, especially since understanding such details allows not only to listen to other people's opinions, but also to draw our own conclusions.

Beyond the discussion brackets will be the issues of developing new types of indices. This requires knowledge of the C language and is more likely the responsibility of the system programmer, rather than the application developer. For the same reason, we practically will not consider software interfaces, but will only focus on what matters to the use of ready-to-use indexes.
')
In this part we will talk about the division of responsibilities between the general indexing mechanism related to the database engine and the individual index access methods that can be added as extensions to PostgreSQL. In the next section, we will look at the interface of the access method and such important concepts as classes and operator families. After such a long but necessary introduction, we consider in detail the design and application of various types of indices: Hash , B-tree , GiST , SP-GiST , GIN and RUM , BRIN and Bloom .

Indices

PostgreSQL indices are special database objects designed primarily to speed up data access. These are auxiliary structures: any index can be deleted and restored again according to the information in the table. Sometimes you hear that a DBMS can work without indices, just slowly. However, this is not the case, since the indices also serve to support some integrity constraints.

Currently, six different kinds of indexes are built into PostgreSQL 9.6, and another one is available as an extension — this was possible due to important changes in version 9.6. So in the near future we should expect the appearance of other types of indices.

Despite all the differences between the types of indexes (also called access methods ), in the end, any of them establishes a correspondence between the key (for example, the value of the indexed column) and the rows of the table in which this key occurs. Strings are identified by a TID (tuple id), which consists of the file block number and the position of the string inside the block. Then, knowing the key or some information about it, you can quickly read those lines that may contain information of interest to us, without looking at the entire table.

It is important to understand that the index, speeding up access to data, in return requires a certain amount of maintenance costs. For any operation on indexed data - whether it is inserting, deleting or updating table rows, the indices created for this table must be rebuilt, moreover, within the framework of the same transaction. Note that updating the table fields for which no indexes were created does not lead to the rebuilding of the indexes; This mechanism is called HOT (Heap-Only Tuples).

Extensibility has some implications. To make the new access method easy to integrate into the system, PostgreSQL has a general indexing mechanism. Its main task is to obtain the TID from the access method and work with them:

reading data from the corresponding versions of rows of the table;
sample by individual TID, or immediately by a set of TID (with the construction of a bitmap);
checking the visibility of row versions for the current transaction, taking into account the isolation level.

The indexing mechanism is involved in the execution of queries; It is called according to a plan built during the optimization phase. The optimizer, sorting through and evaluating various ways of executing a query, must understand the capabilities of all the access methods that can potentially be applied. Will the access method be able to give the data immediately in the right order, or should a separate sort be provided? Is it possible to use an access method to find null? - such questions are constantly solved by the optimizer.

Information about the access method is needed not only by the optimizer. When creating an index, the system needs to decide: is it possible to build an index over several columns? Can this index provide uniqueness?

So, each access method must provide all the necessary information about itself. Prior to version 9.6, the pg_am table was used for this, and starting from 9.6, the data migrated deeper, inside special functions. We will get acquainted with this interface a bit later.

The tasks of the access method itself include everything else:

implementation of the algorithm for constructing the index and splitting data into pages (so that any index is processed in the same way by the buffer cache manager);
information search in the index by the expression " indexed-field operator expression ";
evaluation of the cost of using the index;
work with locks required for correct parallel execution of processes;
formation of the log proactive record (WAL).

First, we look at the capabilities of the general indexing mechanism, and then proceed to consider the various access methods.

Indexing mechanism

The indexing mechanism allows PostgreSQL to work equally with a variety of access methods, given their capabilities.

Basic scan methods

Index scan

You can work differently with the TID supplied by the index. Consider an example:

postgres=# create table t(a integer, b text, c boolean);
CREATE TABLE
postgres=# insert into t(a,b,c)
select s.id, chr((32+random()*94)::integer), random() < 0.01
from generate_series(1,100000) as s(id)
order by random();
INSERT 0 100000
postgres=# create index on t(a);
CREATE INDEX
postgres=# analyze t;
ANALYZE

We created a table with three fields. The first field contains numbers from 1 to 100,000, and an index has been created for it (for now it does not matter which one). The second field contains various ASCII characters besides non-printable characters. Finally, the third field contains a logical value, true for approximately 1% of the lines, and false for the rest. Rows inserted into the table in random order.

Let's try to choose a value by the condition "a = 1". Note that the condition has the form " indexed-field operator expression ", where the operator is "equal", and the expression (search key) is "1". In most cases, the condition must have exactly such a form that the index can be used.

postgres=# explain (costs off) select * from t where a = 1;
QUERY PLAN
-------------------------------
Index Scan using t_a_idx on t
Index Cond: (a = 1)
(2 rows)

In this case, the optimizer decided to use an index scan (Index Scan). In the index view, the access method returns the TID values one by one, until the matching rows run out. The indexing mechanism, in turn, accesses the pages of the table indicated by the TID, gets the row version, checks its visibility according to the multiversion rules, and returns the received data.

Bitmap scanning

Index scanning works well when it comes to just a few values. However, increasing the sample increases the chances of having to go back to the same tabular page several times. Therefore, in this case, the optimizer switches to a bitmap scan:

postgres=# explain (costs off) select * from t where a <= 100;
QUERY PLAN
------------------------------------
Bitmap Heap Scan on t
Recheck Cond: (a <= 100)
-> Bitmap Index Scan on t_a_idx
Index Cond: (a <= 100)
(4 rows)

First, the access method returns all TIDs that match the condition (Bitmap Index Scan node), and the bitmap of string versions is constructed from them. Then the versions of the rows are read from the table (Bitmap Heap Scan) - each page will be read only once.

Note that in the second step the condition can be rechecked (Recheck Cond). The sample may be too large for the string version version bitmap to fit entirely into RAM (limited by the work_mem parameter). In this case, only the bitmap of the pages containing at least one suitable version of the string is constructed. Such a “rough” card takes up less space, but when reading a page you have to re-check the conditions for each line stored there. Note that even in the case of a small sample (as in our example), the “Recheck Cond” step is still displayed in the plan, although it is not actually implemented.

If conditions are imposed on several table fields, and these fields are indexed, scanning the bitmap allows (if the optimizer finds it advantageous) to use several indexes at the same time. For each index, bitmap versions of strings are built, which are then logically multiplied bit-wise (if expressions are joined by AND condition) or logically added (if expressions are joined by OR condition). For example:

postgres=# create index on t(b);
CREATE INDEX
postgres=# analyze t;
ANALYZE
postgres=# explain (costs off) select * from t where a <= 100 and b = 'a';
QUERY PLAN
--------------------------------------------------
Bitmap Heap Scan on t
Recheck Cond: ((a <= 100) AND (b = 'a'::text))
-> BitmapAnd
-> Bitmap Index Scan on t_a_idx
Index Cond: (a <= 100)
-> Bitmap Index Scan on t_b_idx
Index Cond: (b = 'a'::text)
(7 rows)

Here, the BitmapAnd node combines two bitmaps using the “and” bitwise operation.

Bitmap scanning avoids repeated access to the same data page. But what if the data in the table pages is physically ordered just like the index entries? Of course, you cannot completely rely on the physical order of the data in the pages - if you need sorted data, you must explicitly specify the ORDER BY clause in the query. But it is quite possible that in fact “almost all” data is ordered: for example, if the rows are added in the correct order and do not change after that, or after executing the CLUSTER command. Then the construction of the bitmap is an extra step, the usual index scan will be no worse (if you do not take into account the possibility of combining several indices). Therefore, when choosing an access method, the scheduler looks in special statistics that show the degree of ordering of the data:

postgres=# select attname, correlation from pg_stats where tablename = 't';
attname | correlation
---------+-------------
b | 0.533512
c | 0.942365
a | -0.00768816
(3 rows)

Values close in magnitude to unity indicate a high degree of ordering (as for column c), while those close to zero indicate, on the contrary, a chaotic distribution (column a).

Sequential scan

For completeness, it should be said that under a non-selective condition, the optimizer will prefer to use the index to sequentially scan the entire table:

postgres=# explain (costs off) select * from t where a <= 40000;
QUERY PLAN
------------------------
Seq Scan on t
Filter: (a <= 40000)
(2 rows)

And he will be right. The fact is that the indices work the better, the higher the selectivity of the condition, that is, the fewer rows it satisfies. As the sample increases, so does the overhead of reading index pages.

The situation is aggravated by the fact that sequential reading is performed faster than reading pages "one to one". This is especially true for hard drives, where the mechanical operation of leading the head to the track takes significantly more time than reading the data itself; in the case of SSDs, this effect is less pronounced. To take into account the difference in the cost of access, there are two parameters seq_page_cost and random_page_cost, which can be set not only globally, but also at the level of table spaces, taking into account the characteristics of different disk subsystems.

Covering indexes

As a rule, the main task of the access method is to return the identifiers of the matching rows of the table so that the indexing engine can read the necessary data from them. But what if the index already contains all the necessary data for the query? Such an index is called covering (covering), in which case the optimizer can apply only Index Scan (Scan Only):

postgres=# vacuum t;
VACUUM
postgres=# explain (costs off) select a from t where a < 100;
QUERY PLAN
------------------------------------
Index Only Scan using t_a_idx on t
Index Cond: (a < 100)
(2 rows)

The name may suggest that the indexing mechanism does not refer to the table at all, receiving all the necessary information solely from the access method. But this is not entirely true, because PostgreSQL indexes do not contain information that allows you to judge the visibility of rows. Therefore, the access method returns all versions of the rows that match the search condition, regardless of whether they are visible in the current transaction or not.

However, if the indexing mechanism had to look at the table every time to determine visibility, this scanning method would be no different from a regular index scan.

The problem is solved by the fact that PostgreSQL supports for tables a so-called visibility map, in which the cleaning process (vacuum) marks the pages where the data did not change long enough for all transactions to see, regardless of the start time and isolation level. If the identifier of the row returned by the index refers to such a page, then visibility can be not checked.

Therefore, regular cleaning performance increases the effectiveness of covering indexes. Moreover, the optimizer takes into account the number of unclean rows and can abandon the use of an index scan only if it predicts a large overhead of visibility checks.

The number of forced table accesses can be found using the explain analyze command:

postgres=# explain (analyze, costs off) select a from t where a < 100;
QUERY PLAN
-------------------------------------------------------------------------------
Index Only Scan using t_a_idx on t (actual time=0.025..0.036 rows=99 loops=1)
Index Cond: (a < 100)
Heap Fetches: 0
Planning time: 0.092 ms
Execution time: 0.059 ms
(5 rows)

In this case, it was not necessary to refer to the table (Heap Fetches: 0), since it had just been cleared. In general, the closer this number is to zero, the better.

Not all indexes store indexed values along with row IDs. If the access method cannot return data, it cannot be used exclusively for an index scan.

Null

Indefinite values play an important role in relational databases as a convenient way of representing the fact that a value does not exist or is not known.

But special importance requires special treatment. Normal Boolean logic turns into a three-digit; it is not clear whether the undefined value should be less than usual values or greater (hence the special constructions for sorting NULLS FIRST and NULLS LAST); It is not obvious whether uncertain values should be taken into account in aggregate functions or not; Special statistics are required for the scheduler ...

From the point of view of index support with uncertain values, there is also a lack of clarity: should these values be indexed or not? If you do not index null, then the index can be smaller. But if we index, then it becomes possible to use an index for the “ indexed-field IS [NOT] NULL” type of conditions, and also as a covering index with no conditions on the table (since in this case the index should return the data of all rows of the table, including number and with uncertain values).

For each access method, its developers make their decision whether to index indefinite values or not. But, as a rule, they are still indexed.

Multiple field indices

Conditions for several fields can be supported using multi-column indices . For example, we could create an index for two fields of our table:

postgres=# create index on t(a,b);
CREATE INDEX
postgres=# analyze t;
ANALYZE

The optimizer is likely to prefer such an index to combining bitmaps, because here we immediately get the necessary TIDs without any additional actions:

postgres=# explain (costs off) select * from t where a <= 100 and b = 'a';
QUERY PLAN
------------------------------------------------
Index Scan using t_a_b_idx on t
Index Cond: ((a <= 100) AND (b = 'a'::text))
(2 rows)

A multi-column index can also be used to speed up sampling by condition on a part of the fields - starting with the first:

postgres=# explain (costs off) select * from t where a <= 100;
QUERY PLAN
--------------------------------------
Bitmap Heap Scan on t
Recheck Cond: (a <= 100)
-> Bitmap Index Scan on t_a_b_idx
Index Cond: (a <= 100)
(4 rows)

As a rule, if no condition is imposed on the first field, the index will not be used. But in some cases, the optimizer may find that it is still more profitable than sequential scans. We will touch on this topic in more detail when considering btree indexes.

Not all access methods support the creation of indexes across multiple columns.

Expression indexes

We talked about the fact that the search condition should have the form " indexed-field operator expression ". In the example below, the index will not be used, since instead of the field name itself, an expression is used with it:

postgres=# explain (costs off) select * from t where lower(b) = 'a';
QUERY PLAN
------------------------------------------
Seq Scan on t
Filter: (lower((b)::text) = 'a'::text)
(2 rows)

This particular query is not difficult to rewrite so that only the field name is to the left of the operator. But if this is not possible, expressions (functional indexes) come to the rescue:

postgres=# create index on t(lower(b));
CREATE INDEX
postgres=# analyze t;
ANALYZE
postgres=# explain (costs off) select * from t where lower(b) = 'a';
QUERY PLAN
----------------------------------------------------
Bitmap Heap Scan on t
Recheck Cond: (lower((b)::text) = 'a'::text)
-> Bitmap Index Scan on t_lower_idx
Index Cond: (lower((b)::text) = 'a'::text)
(4 rows)

A functional index is created not by a table field, but by an arbitrary expression; the optimizer will take into account such an index for the conditions of the form “ indexed-expression statement expression ”. If the calculation of the indexed expression is a costly operation, then updating the index will require significant computational resources.

It should also be borne in mind that the indexed expression is going to separate statistics. It can be seen in the pg_stats view by the index name:

postgres=# \dt
Table "public.t"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | text |
c | boolean |
Indexes:
"t_a_b_idx" btree (a, b)
"t_a_idx" btree (a)
"t_b_idx" btree (b)
"t_lower_idx" btree (lower(b))

postgres=# select * from pg_stats where tablename = 't_lower_idx';
...

If necessary, you can control the number of histogram baskets in the same way as for ordinary table fields (bearing in mind that the column name may be different depending on the indexed expression):

postgres=# \d t_lower_idx
Index "public.t_lower_idx"
Column | Type | Definition
--------+------+------------
lower | text | lower(b)
btree, for table "public.t"

postgres=# alter index t_lower_idx alter column "lower" set statistics 69;
ALTER INDEX

Partial indexes

Sometimes it becomes necessary to index only a portion of the rows in a table. This is usually associated with a strong uneven distribution: it makes sense to search for a rare value by index, but it is often easier to find it by full scanning the table.

Of course, you can build a regular index on the column "c", and it will work as we expect:

postgres=# create index on t(c);
CREATE INDEX
postgres=# analyze t;
ANALYZE
postgres=# explain (costs off) select * from t where c;
QUERY PLAN
-------------------------------
Index Scan using t_c_idx on t
Index Cond: (c = true)
Filter: c
(3 rows)

postgres=# explain (costs off) select * from t where not c;
QUERY PLAN
-------------------
Seq Scan on t
Filter: (NOT c)
(2 rows)

In this case, the index takes 276 pages:

postgres=# select relpages from pg_class where relname='t_c_idx';
relpages
----------
276
(1 row)

But, since the “c” column is true for only one percent of the rows, 99% of the index is simply never used. In this case, you can build a partial index:

postgres=# create index on t(c) where c;
CREATE INDEX
postgres=# analyze t;
ANALYZE

The size of this index has decreased to 5 pages:

postgres=# select relpages from pg_class where relname='t_c_idx1';
relpages
----------
5
(1 row)

In some cases, the difference in volume and performance can be quite substantial.

Sorting

If the access method returns string identifiers in the sort order, this gives the optimizer additional options for executing the query.

You can scan the table and then sort the data:

postgres=# set enable_indexscan=off;
SET
postgres=# explain (costs off) select * from t order by a;
QUERY PLAN
---------------------
Sort
Sort Key: a
-> Seq Scan on t
(3 rows)

And you can read the data using the index immediately in the sort order:

postgres=# set enable_indexscan=on;
SET
postgres=# explain (costs off) select * from t order by a;
QUERY PLAN
-------------------------------
Index Scan using t_a_idx on t
(1 row)

Of all the access methods, only btree can return data in a sorted form, so postpone a more detailed conversation until we consider this type of index.

Parallel construction

Typically, building an index requires setting a SHARE lock on the table. Such a lock allows you to read data from a table, but prohibits any changes while the index is being built.

You can make sure of this if at the moment of creating the index, say on table t, in another session, perform the query:

postgres=# select mode, granted from pg_locks where relation = 't'::regclass;
mode | granted
-----------+---------
ShareLock | t
(1 row)

If the table is large enough and is actively used in insert, update, or delete mode, this may be invalid - changing sessions will wait for the lock to be released for a long time.

In this case, you can use parallel index creation:

postgres=# create index concurrently on t(a);
CREATE INDEX

Such a command establishes a SHARE UPDATE EXCLUSIVE lock, which allows both reading and changing data (only changing the structure of the table is prohibited, as well as simultaneous cleaning, analyzing, or building another index on the same table).

However, there is a downside. First, the index will be built more slowly than usual, because instead of one pass through the table, two runs, and it is still necessary to wait for the completion of parallel transactions that modify the data.

Secondly, with the parallel construction of the index, a deadlock or violation of the uniqueness constraint may occur. The index is nevertheless created, but in a “non-working” state; in this case, it must be deleted and recreated again. Non-working indices are marked with the word INVALID in the output of the psql \ d command, and the full list can be obtained with the query:

postgres=# select indexrelid::regclass index_name, indrelid::regclass table_name from pg_index where not indisvalid;
index_name | table_name
------------+------------
t_a_idx | t
(1 row)

Continued .

Source: https://habr.com/ru/post/326096/

All Articles