PostgreSQL indexes - 10

In previous articles, we looked at PostgreSQL's indexing mechanism and interface access methods , as well as hash indices , B-trees , GiST , SP-GiST , GIN , RUM, and BRIN . It remains for us to look at the Bloom indices.

Bloom

General idea

The classic Bloom filter is a data structure that allows you to quickly verify that an element belongs to a set. The filter is very compact, but it allows false positives: it has the right to make a mistake and consider the element as belonging to the set (false positive), but it has no right to say that the element is not in the set if it is actually present (false negative).

The filter is a bitmap (also called a signature ) of length m bits, initially filled with zeros. K different hash functions are selected that map any element of the set to k bits of the signature. To add an element to a set, you need to set each of these bits to one in the signature. Therefore, if all the bits corresponding to the element are set to one, the element may be present in the set; if at least one bit is zero, the element is definitely missing.
')
In the case of a DBMS index, we actually have N separate filters built for each index line. As a rule, several fields are included in the index; the values of these fields constitute the set of elements for each of the rows.

Thanks to the choice of the size of the signature m, one can find a compromise between the volume of the index and the probability of false positives. The Bloom Index's scope is large, fairly “wide” tables, queries to which can use filtering by any of the fields. This access method, like BRIN, can be considered as a sequential scanning accelerator: all the matches found by the index need to be rechecked on the table, but there is a chance not to consider a significant part of the lines at all.

Device

We already talked about signature trees in the context of the GiST access method. Unlike these trees, the Bloom Index is a flat structure. It consists of a metastpage, followed by ordinary pages with index lines. Each index line contains a signature and a link to a table row (TID), as shown schematically in the figure.

Creating and selecting parameter values

When creating a bloom index, the storage parameters indicate the total size of the signature (length) and the number of bits to be set separately for each field included in the index (col1 — col32):

create index on ... using bloom(...) with (length=..., col1=..., col2=..., ...);

The way of specifying the number of bits looks strange: these numbers should be a parameter not of an index, but of a class of operators. Just now, classes of operators cannot be parameterized, although work is underway on this.

How to choose suitable values? The theory says that if we set the probability p of a false filter response, then the optimal number of bits of the signature can be estimated as
m = −n log ₂ (p) / ln (2), where n is the number of fields in the index, and the number of bits to be set is k = −log ₂ (p).

Inside the index, the signature is stored as an array of two-byte integers, so that the value of m can be safely rounded to the top to 16.

Choosing the probability p, you should consider the size of the index, which will be approximately equal to
(m / 8 + 6) N, where N is the number of rows in the table, and 6 is the size of the TID pointer in bytes.

A few points:

The probability of false positives p refers to one filter; when scanning the table, we expect to get Np false positives (for a query that returns few rows, of course). For example, for a table with a million rows and a probability of 0.01, on average we can expect in terms of the query “Rows Removed by Index Recheck: 10,000”.
Bloom filter - probabilistic structure. It makes sense to talk about concrete numbers, only averaging quite a lot of values, and in each specific case, you can get anything you want.
The above estimates are based on an idealized mathematical model and a number of assumptions. In practice, the result is likely to turn out worse. So the formulas should be treated calmly: this is just a way to choose initial values for further experiments.
The access method allows you to select the number of bits to be set separately for each field. There is a reasonable assumption that in fact the optimal number depends on the distribution of values in the column. If you want to be confused, you can see, for example, this article (I will be grateful for links to other studies). But first reread the previous paragraph.

Update

When a new row is inserted into the table, a signature is formed: for the values of all indexed fields, all the corresponding bits are set to one. According to the classics, it is necessary to have k different hash functions; in practice, it is managed with a pseudo-random number generator, the generating element (seed) for which is selected each time using a single hash function.

A normal Bloom filter does not support deletion of elements, but this is not required for a Bloom index: deleting a row from the table removes the entire signature along with the index row.

The change, as usual, consists of deleting the old version of the string and inserting a new one (in this case, the signature is re-evaluated).

Scanning

Since the only thing that Bloom's filter can do is to check whether an element belongs to a set, the only operation supported by Bloom-index is the equality test (as in the hash index).

As we said, the Bloom Index is flat, so with index access it is always read in succession and in its entirety. In the process of reading a bitmap is built, which is then used to access the table.

With traditional index access, it is assumed that you will need to read a few index pages, which, moreover, may soon be needed again; therefore, they are placed in the buffer cache. Reading a bloom index is actually a sequential scan. In order to prevent the caching of useful information from the cache, reading occurs through a small buffer ring, just as with sequential scanning of tables.

It should be noted that the larger the bloom index size, the less attractive it will seem to the planner - this dependence is linear, unlike tree-like indices.

Example

Table

Let's look at the Bloom Index on the example of a large table flights_bi from the previous article . Let me remind you that the size of this table is 4 GB with approximately 30 million rows. Table Definition:

demo=# \d flights_bi
Table "bookings.flights_bi"
Column | Type | Collation | Nullable | Default
--------------------+--------------------------+-----------+----------+---------
airport_code | character(3) | | |
airport_coord | point | | |
airport_utc_offset | interval | | |
flight_no | character(6) | | |
flight_type | text | | |
scheduled_time | timestamp with time zone | | |
actual_time | timestamp with time zone | | |
aircraft_code | character(3) | | |
seat_no | character varying(4) | | |
fare_conditions | character varying(10) | | |
passenger_id | character varying(20) | | |
passenger_name | text | | |

Start by creating an extension - Bloom Index is included in the standard distribution starting from version 9.6, but is not available by default.

demo=# create extension bloom;
CREATE EXTENSION

Using BRIN last time we managed to index three fields (scheduled_time, actual_time, airport_utc_offset). The bloom indices do not depend on the physical location of the data, so we will try to include in the index almost all fields of the table. However, we exclude the time (scheduled_time and actual_time) - the method supports only comparison for equality, but the request for the exact time is hardly interesting to someone (one could build an index by expression, rounding up the time to a day, but we will not do that). We also have to exclude airport coordinates (airport_coord) - looking ahead, the point type is not supported.

To select the values of the parameters, we take the probability of a false positive of 0.01 (understanding that it will actually turn out more). The above formulas for n = 9 and N = 30,000,000 give a signature size of 96 bits and suggest setting 7 bits per element. The estimated index size is 515 MB (approximately the eighth part of the table).

(With a minimum signature size of 16 bits, the formulas promise a two-fold smaller index, but they can only hope for a probability of 0.5, which is quite bad.)

Operator Classes

So, we try to create an index.

demo=# create index flights_bi_bloom on flights_bi
using bloom(airport_code, airport_utc_offset, flight_no, flight_type, aircraft_code, seat_no, fare_conditions, passenger_id, passenger_name)
with (length=96, col1=7, col2=7, col3=7, col4=7, col5=7, col6=7, col7=7, col8=7, col9=7);
ERROR: data type character has no default operator class for access method "bloom"
HINT: You must specify an operator class for the index or define a default operator class for the data type.

Sadness: in the extension, we are given only two classes of operators:

demo=# select opcname, opcintype::regtype
from pg_opclass
where opcmethod = (select oid from pg_am where amname = 'bloom')
order by opcintype::regtype::text;
opcname | opcintype
----------+-----------
int4_ops | integer
text_ops | text
(2 rows)

Fortunately, it is not difficult to create similar classes for other data types. The operator class for the bloom method must include exactly one operator — equality — and exactly one auxiliary function — hashing. The easiest way to find the necessary operators and functions for an arbitrary type is to look into the system catalog for the classes of operators of the hash method:

demo=# select distinct
opc.opcintype::regtype::text,
amop.amopopr::regoperator,
ampr.amproc
from pg_am am, pg_opclass opc, pg_amop amop, pg_amproc ampr
where am.amname = 'hash'
and opc.opcmethod = am.oid
and amop.amopfamily = opc.opcfamily
and amop.amoplefttype = opc.opcintype
and amop.amoprighttype = opc.opcintype
and ampr.amprocfamily = opc.opcfamily
and ampr.amproclefttype = opc.opcintype
order by opc.opcintype::regtype::text;
opcintype | amopopr | amproc
-----------+----------------------+--------------
abstime | =(abstime,abstime) | hashint4
aclitem | =(aclitem,aclitem) | hash_aclitem
anyarray | =(anyarray,anyarray) | hash_array
anyenum | =(anyenum,anyenum) | hashenum
anyrange | =(anyrange,anyrange) | hash_range
...

With this information, create two missing classes:

demo=# CREATE OPERATOR CLASS character_ops
DEFAULT FOR TYPE character USING bloom AS
OPERATOR 1 =(character,character),
FUNCTION 1 hashbpchar;
CREATE OPERATOR CLASS

demo=# CREATE OPERATOR CLASS interval_ops
DEFAULT FOR TYPE interval USING bloom AS
OPERATOR 1 =(interval,interval),
FUNCTION 1 interval_hash;
CREATE OPERATOR CLASS

For points (point type) the hash function is not defined; it is for this reason that we will not be able to build a bloom index on such a field (just as we will not be able to perform a hash join using fields of this type).

We try again:

demo=# create index flights_bi_bloom on flights_bi
using bloom(airport_code, airport_utc_offset, flight_no, flight_type, aircraft_code, seat_no, fare_conditions, passenger_id, passenger_name)
with (length=96, col1=7, col2=7, col3=7, col4=7, col5=7, col6=7, col7=7, col8=7, col9=7);
CREATE INDEX

The size of the index is 526 MB, which is somewhat larger than the calculated one, due to the fact that in the formula we do not take into account the overhead expenses for the service information in each block.

demo=# select pg_size_pretty(pg_total_relation_size('flights_bi_bloom'));
pg_size_pretty
----------------
526 MB
(1 row)

Requests

Now we can perform a search on a variety of criteria, and it will be supported by the index.

As we said, the Bloom filter is a probabilistic structure, so the effectiveness in each particular case can be different. For example, look at the records related to two passengers, Miroslav Sidorov:

demo=# explain(costs off,analyze)
select * from flights_bi where passenger_name='MIROSLAV SIDOROV';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Bitmap Heap Scan on flights_bi (actual time=2639.010..3010.692 rows=2 loops=1)
Recheck Cond: (passenger_name = 'MIROSLAV SIDOROV'::text)
Rows Removed by Index Recheck: 38562
Heap Blocks: exact=21726
-> Bitmap Index Scan on flights_bi_bloom (actual time=1065.191..1065.191 rows=38564 loops=1)
Index Cond: (passenger_name = 'MIROSLAV SIDOROV'::text)
Planning time: 0.109 ms
Execution time: 3010.736 ms

and Marfa Solovyova:

demo=# explain(costs off,analyze)
select * from flights_bi where passenger_name='MARFA SOLOVEVA';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Bitmap Heap Scan on flights_bi (actual time=9980.884..10142.684 rows=2 loops=1)
Recheck Cond: (passenger_name = 'MARFA SOLOVEVA'::text)
Rows Removed by Index Recheck: 3950168
Heap Blocks: exact=45757 lossy=67332
-> Bitmap Index Scan on flights_bi_bloom (actual time=1037.588..1037.588 rows=212972 loops=1)
Index Cond: (passenger_name = 'MARFA SOLOVEVA'::text)
Planning time: 0.157 ms
Execution time: 10142.730 ms

In one case, the filter allowed only 40 thousand false positives, in the other - as many as 4 million. Correspondingly, the execution time of requests is different

But the results for the search for the same lines, using not the name, but the document number. Miroslav:

demo=# explain(costs off,analyze)
demo-# select * from flights_bi where passenger_id='5864 006033';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on flights_bi (actual time=13747.305..16907.387 rows=2 loops=1)
Recheck Cond: ((passenger_id)::text = '5864 006033'::text)
Rows Removed by Index Recheck: 9620258
Heap Blocks: exact=50510 lossy=165722
-> Bitmap Index Scan on flights_bi_bloom (actual time=937.202..937.202 rows=426474 loops=1)
Index Cond: ((passenger_id)::text = '5864 006033'::text)
Planning time: 0.110 ms
Execution time: 16907.423 ms

And Martha:

demo=# explain(costs off,analyze)
select * from flights_bi where passenger_id='2461 559238';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Bitmap Heap Scan on flights_bi (actual time=3881.615..3934.481 rows=2 loops=1)
Recheck Cond: ((passenger_id)::text = '2461 559238'::text)
Rows Removed by Index Recheck: 30669
Heap Blocks: exact= 27513
-> Bitmap Index Scan on flights_bi_bloom (actual time=1084.391..1084.391 rows=30671 loops=1)
Index Cond: ((passenger_id)::text = '2461 559238'::text)
Planning time: 0.120 ms
Execution time: 3934.517 ms

Efficiency is again very different, this time it was more lucky for Martha.

Note that the search simultaneously for two fields will be performed much more efficiently, since the probability of a false positive p turns into p ² :

demo=# explain(costs off,analyze)
select * from flights_bi
where passenger_name='MIROSLAV SIDOROV'
and passenger_id='5864 006033';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on flights_bi (actual time=872.593..877.915 rows=2 loops=1)
Recheck Cond: (((passenger_id)::text = '5864 006033'::text)
AND (passenger_name = 'MIROSLAV SIDOROV'::text))
Rows Removed by Index Recheck: 357
Heap Blocks: exact=356
-> Bitmap Index Scan on flights_bi_bloom (actual time=832.041..832.041 rows=359 loops=1)
Index Cond: (((passenger_id)::text = '5864 006033'::text)
AND (passenger_name = 'MIROSLAV SIDOROV'::text))
Planning time: 0.524 ms
Execution time: 877.967 ms

But the condition with a logical "or" is not supported at all - this is a restriction not on this access method, but on the scheduler. It remains, of course, the option to read the index twice, build two bitmaps and merge them, but it’s probably too expensive for such a plan to be chosen.

Comparison with BRIN and Hash

The areas of use of Bloom indices and BRIN indices obviously overlap. These are large tables for which it is desirable to provide a search in different fields, and the accuracy of the search is sacrificed for compactness.

BRIN indices are smaller (say, up to tens of megabytes in our example) and can support a range search, but have a very strong limitation associated with the physical location of the data in the file. Bloom indices are larger (hundreds of megabytes), but they have no limitations, except for the presence of a suitable hash function.

Like Bloom's indices, hash indices support a single equality test. The hash index provides search accuracy that is unattainable for Bloom, but its size is significantly larger (in our example, only one gigabyte field, and no hash index over several fields).

Properties

Traditionally, we will look at the properties (requests were cited earlier ).

Method properties:

amname | name | pg_indexam_has_property
--------+---------------+-------------------------
bloom | can_order | f
bloom | can_unique | f
bloom | can_multi_col | t
bloom | can_exclude | f

Obviously, the access method allows building indexes on several columns - the Bloom index on one column is hardly meaningful.

Index properties:

name | pg_index_has_property
---------------+-----------------------
clusterable | f
index_scan | f
bitmap_scan | t
backward_scan | f

The only possible way to scan - on the bitmap. Since the index is always scanned entirely, it makes no sense to implement the usual index access, in which the TIDs are returned one by one.

name | pg_index_column_has_property
--------------------+------------------------------
asc | f
desc | f
nulls_first | f
nulls_last | f
orderable | f
distance_orderable | f
returnable | f
search_array | f
search_nulls | f

Here, all the “dashes”, and even with uncertain values, this access method does not know how to work.

And finally

I do not exclude that this series of articles will continue in the future with the advent of new interesting types of indices, but now it is time to stop.

I want to express my gratitude to my colleagues from Postgres Professional (some of whom are the authors of many of the access methods considered) for reading drafts and comments. And, of course, I appreciate your patience and valuable comments. Your attention has given me the strength to reach this point. Thank!

Source: https://habr.com/ru/post/349224/

All Articles