At the last PGConf.Russia there was a report about the expansion of MobilityDB , and Andrei Borodin proposed the idea of expanding the methods of the indices for the task.
I will continue the theme with the Postgres extension for the problem being solved on the example of the extension made within the HighLoad Cup 2018, the code is available on GithHub . On Habré there is already an article from blackmaster . The extension adds two types with support for btree and GIN indexes.
Initial scheme:
CREATE TABLE accounts ( id INTEGER PRIMARY KEY, email VARCHAR(100) UNIQUE, fname SMALLINT, sname VARCHAR(50), phone VARCHAR(16) UNIQUE, birth TIMESTAMP NOT NULL, country SMALLINT, city SMALLINT, email_domain SMALLINT, joined TIMESTAMP NOT NULL, status status, sex sex NOT NULL, interests SMALLINT[], premium tsrange, likes_ids INTEGER[], likes_tss INTEGER[], CHECK ( joined >= '01.01.2011'::timestamp ), CHECK ( joined <= '01.01.2018'::timestamp ) ); CREATE INDEX IF NOT EXISTS interests_ids_index ON accounts USING GIN(interests);
The number of entries is 1,300,000 , and the dimensions of the relationship of interest are:
relation | size |
---|---|
accounts | 598 MB |
accounts_email_key | 54 MB |
accounts_phone_key | 32 MB |
accounts_pkey | 28 MB |
interests_ids_index | 8072 kB |
The final scheme:
CREATE UNLOGGED TABLE accounts ( id INTEGER PRIMARY KEY, email VARCHAR(100) UNIQUE, fname SMALLINT, sname VARCHAR(50), phone phone UNIQUE, birth TIMESTAMP NOT NULL, country SMALLINT, city SMALLINT, email_domain SMALLINT, joined TIMESTAMP NOT NULL, status status, sex sex NOT NULL, interests bit128, premium tsrange, likes_ids INTEGER[], likes_tss INTEGER[], CHECK ( joined >= '01.01.2011'::timestamp ), CHECK ( joined <= '01.01.2018'::timestamp ) );
Dimensions:
relation | old size | new size |
---|---|---|
accounts | 598 MB | 579 MB |
accounts_phone_key | 32 MB | 28 MB |
accounts_pkey | 28 MB | 28 MB |
interests_ids_index | 8072 kB | 8072 kB |
Two types have been added: phone and bit128 .
The phone number looks like 8 (929) 5481819 , the eight can be discarded. Operator code fits into 10 bits, the number is 24 bits, it turns out you need 5 bytes. Not very convenient number due to the alignment of data in memory. The question of how best to lay data in 5, 6 or 8 bytes, the answer can only give tests.
If you follow the example of the documentation, you need a little bit. Determine the type:
class Phone : public PAllocNew<Phone> { public: bool fromCString(char* str) noexcept; char* toCString() const noexcept; int code() const noexcept; bool operator==(const Phone& b) const noexcept; bool operator<(const Phone& b) const noexcept; bool operator<=(const Phone& b) const noexcept; bool operator>(const Phone& b) const noexcept; bool operator>=(const Phone& b) const noexcept; private: uint64_t m_data = 0; }; #define GETARG_PHONE(x) reinterpret_cast<Phone*>(PG_GETARG_POINTER(x))
PAllocNew helper class overloading new
:
template<typename T> class PAllocNew { public: static void* operator new(std::size_t count, const std::nothrow_t& tag) { return palloc(count); } };
The memory allocated through palloc is released when the transaction is completed. Add I / O functions:
Datum phone_in(PG_FUNCTION_ARGS) { char* str = PG_GETARG_CSTRING(0); auto v = new(std::nothrow) Phone; if (!v->fromCString(str)) { ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg( "invalid input syntax for phone: \"%s\"", str ))); } PG_RETURN_POINTER(v); } Datum phone_out(PG_FUNCTION_ARGS) { const auto phone = GETARG_PHONE(0); PG_RETURN_CSTRING(phone->toCString()); }
And type registration:
CREATE TYPE phone; CREATE OR REPLACE FUNCTION phone_in ( cstring ) RETURNS phone AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION phone_out ( phone ) RETURNS cstring AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE TYPE phone ( INTERNALLENGTH = 8, INPUT = phone_in, OUTPUT = phone_out, ALIGNMENT = int4, COLLATE TRUE );
Since A new type has a size of no more than 8 bytes, then it can be transmitted not by reference, but by value. In this case, in the CREATE TYPE
you need to specify the PASSEDBYVALUE
flag. Or use LIKE
:
CREATE TYPE phone ( INPUT = phone_in, OUTPUT = phone_out, LIKE = int8, COLLATE TRUE );
Then INTERNALLENGTH
, ALIGNMENT
and PASSEDBYVALUE
inherit from int8.
The uniqueness of the field is supported by creating a unique B-tree index. This is done through the classes of operators and strategies .
Operator | room |
---|---|
less | one |
less or equal | 2 |
equally | 3 |
more or equal | four |
more | five |
Function | room |
---|---|
equally | one |
CREATE OPERATOR = ( PROCEDURE = phone_equal, LEFTARG = phone, RIGHTARG = phone, COMMUTATOR = =, NEGATOR = != ); CREATE OPERATOR < ( PROCEDURE = phone_lt, LEFTARG = phone, RIGHTARG = phone, NEGATOR = > ); CREATE OPERATOR <= ( PROCEDURE = phone_le, LEFTARG = phone, RIGHTARG = phone ); CREATE OPERATOR >= ( PROCEDURE = phone_ge, LEFTARG = phone, RIGHTARG = phone ); CREATE OPERATOR > ( PROCEDURE = phone_gt, LEFTARG = phone, RIGHTARG = phone, NEGATOR = < ); CREATE OPERATOR CLASS btree_phone_ops DEFAULT FOR TYPE phone USING btree AS OPERATOR 1 <, OPERATOR 2 <=, OPERATOR 3 =, OPERATOR 4 >=, OPERATOR 5 >, FUNCTION 1 phone_equal_internal ( phone, phone );
Operators return bool
, and phone_equal_internal
int:
Datum phone_equal_internal(PG_FUNCTION_ARGS) { const auto a = GETARG_PHONE(0); const auto b = GETARG_PHONE(1); if (*a < *b) { PG_RETURN_INT32(-1); } if (*a > *b) { PG_RETURN_INT32(1); } PG_RETURN_INT32(0); }
All this gave a slight decrease in data:
relation | size | diff |
---|---|---|
accounts | 595 MB | 3 MB |
accounts_phone_key | 28 MB | 4 MB |
The number of accounts with numbers 533 289, which is 41%. At least there is no string comparison when working with an index.
I wanted to have a bitfield with intersection (&&), occurrences (<@) and GIN support. It was enough 96 bit, but went on a simple path and took uint128
.
class BitSet128: public PAllocNew<BitSet128> { public: bool haveIntersection(const BitSet128& b) const noexcept; bool contains(const BitSet128& b) const noexcept; bool operator==(const BitSet128& b) const noexcept; bool operator<(const BitSet128& b) const noexcept; bool operator>(const BitSet128& b) const noexcept; bool fromCString(char* str) noexcept; char* toCString() const noexcept; std::vector<uint8_t> keys() const; private: uint128 m_bits = 0; }; #define GETARG_BIT128(x) reinterpret_cast<BitSet128*>(PG_GETARG_POINTER(x))
Type registration and operation:
CREATE TYPE bit128 ( INTERNALLENGTH = 16, INPUT = bit128_in, OUTPUT = bit128_out, ALIGNMENT = int4 ); CREATE OPERATOR = ( PROCEDURE = bit128_equal, LEFTARG = bit128, RIGHTARG = bit128, COMMUTATOR = =, NEGATOR = != ); CREATE OPERATOR && ( PROCEDURE = bit128_intersection, LEFTARG = bit128, RIGHTARG = bit128 ); CREATE OPERATOR @> ( PROCEDURE = bit128_containts, LEFTARG = bit128, RIGHTARG = bit128 );
About GIN in Postgres was well written by Egor Rogov erogov in the article Indices in PostgreSQL - 7 , applying to the full-text search task. The GIN extensibility documentation says that 4 functions are sufficient:
Extracting an array of keys from an indexed object:
Datum *extractValue(Datum itemValue, int32 *nkeys, bool **nullFlags)
Retrieves keys for the requested value:
Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n, bool **pmatch, Pointer **extra_data, bool **nullFlags, int32 *searchMode)
For example in the query:
SELECT * FROM test WHERE a && ARRAY[1, 2, 3]
extractQuery is applied to the ARRAY[1, 2, 3]
array ARRAY[1, 2, 3]
, and the keys can be 1, 2, 3.
Checks if the value satisfies the query:
bool consistent(bool check[], StrategyNumber n, Datum query, int32 nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool nullFlags[])
Since the extracted keys are stored in the search tree, the comparison function is needed:
int compare(Datum a, Datum b)
The API may seem redundant, but it provides everything that might come in handy. Let's turn to implementation. Extract keys:
Datum gin_extract_value_bit128(PG_FUNCTION_ARGS) { auto bitset = GETARG_BIT128(0); const auto bits = bitset->keys(); int32* nentries = (int32*) PG_GETARG_POINTER(1); *nentries = bits.size(); Datum* entries = NULL; entries = (Datum*) palloc(sizeof(Datum) * bits.size()); for (int i = 0; i < bits.size(); ++i) { entries[i] = DatumGetInt16(bits[i]); } PG_RETURN_POINTER(entries); }
In the output parameter nentries
write the number of keys and return an array of keys of the entries
. Key comparison:
Datum bit128_cmp(PG_FUNCTION_ARGS) { const auto a = PG_GETARG_INT16(0); const auto b = PG_GETARG_INT16(1); if (a < b) { PG_RETURN_INT32(-1); } if (a > b) { PG_RETURN_INT32(1); } PG_RETURN_INT32(0); }
These two functions are enough to create an index. The remaining functions are used when searching by index. Extracting keys from the request:
Datum gin_extract_query_bit128(PG_FUNCTION_ARGS) { const auto a = GETARG_BIT128(0); int32* nentries = (int32*) PG_GETARG_POINTER(1); StrategyNumber strategy = PG_GETARG_UINT16(2); int32* searchMode = (int32*) PG_GETARG_POINTER(6); Datum* res = nullptr; const auto keys = a->keys(); *nentries = keys.size(); res = (Datum*) palloc(sizeof(Datum) * keys.size()); for (int i = 0; i < keys.size(); ++i) { res[i] = DatumGetInt16(keys[i]); } switch (strategy) { case RTOverlapStrategyNumber: // && if (*nentries > 0) { *searchMode = GIN_SEARCH_MODE_DEFAULT; } else { *searchMode = GIN_SEARCH_MODE_ALL; } break; case RTContainsStrategyNumber: // @> if (*nentries > 0) { *searchMode = GIN_SEARCH_MODE_DEFAULT; } else { *searchMode = GIN_SEARCH_MODE_ALL; } break; } PG_RETURN_POINTER(res); }
The first parameter is the value to the right of the operator, the third parameter is responsible for the strategy (operator) by which the search is performed. About the modes of poska better read in the documentation . Compliance function:
Datum gin_bit128_consistent(PG_FUNCTION_ARGS) { bool* check = (bool*) PG_GETARG_POINTER(0); StrategyNumber strategy = PG_GETARG_UINT16(1); int32 nkeys = PG_GETARG_INT32(3); bool* recheck = (bool*) PG_GETARG_POINTER(5); bool res = true; switch (strategy) { case RTOverlapStrategyNumber: // && { for (int i = 0; i < nkeys; ++i) { if (check[i]) { res = true; } }; *recheck = false; } break; case RTContainsStrategyNumber: // @> { for (int i = 0; i < nkeys; ++i) { if (check[i]) { res = true; } }; *recheck = true; } break; } PG_RETURN_BOOL(res); }
Returns true
if the indexed object satisfies the query operator with the strategy number. The check array has a length of nkeys , which corresponds to the number of keys returned by gin_extract_query_bit128 . The check[i] = true
means that the i-th key from gin_extract_query_bit128 is present in the indexed object. This is enough for intersection, but because in GIN we do not work with the values themselves, then for the entry strategy we set recheck to true. This will call the appropriate operator to check the result. Operator class:
CREATE OR REPLACE FUNCTION gin_extract_value_bit128 ( bit128, internal, internal) RETURNS internal AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION gin_extract_query_bit128(bit128, internal, int2, internal, internal) RETURNS internal AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION gin_bit128_consistent(internal, int2, anyelement, int4, internal, internal) RETURNS bool AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OPERATOR CLASS bit128_ops DEFAULT FOR TYPE bit128 USING gin AS OPERATOR 3 &&, OPERATOR 6 =, OPERATOR 7 @>, FUNCTION 1 bit128_cmp (int2, int2 ), FUNCTION 2 gin_extract_value_bit128(bit128, internal, internal), FUNCTION 3 gin_extract_query_bit128(bit128, internal, int2, internal, internal), FUNCTION 4 gin_bit128_consistent(internal, int2, anyelement, int4, internal, internal), STORAGE int2;
Added STORAGE , it indicates what type of data is used for keys. What is the result was on the disk:
relation | old size | new size | diff |
---|---|---|---|
accounts | 595 MB | 579 MB | 16 MB |
interests_ids_index | 8072 kB | 8072 kB |
An interesting result, because there are nuances:
All created types have a fixed size, so PLAIN storage is selected for them. They do not apply compression and separate storage. Arrays and strings as variable length types pass through TOAST. Yes, there is an overhead of storage size, but there is also compression.
The distribution of interests is as follows:
Number of interests | users |
---|---|
one | 174061 |
2 | 279744 |
3 | 317212 |
four | 262313 |
five | 128512 |
6 | 48099 |
7 | 12228 |
eight | 2232 |
9 | 250 |
null | 75349 |
Honestly, I don’t know how SMALLINT[]
released, but suppose 4 bytes for size and 2 bytes for value. Then in 16 bytes you can fit an array of 6 elements. And 98% of the elements would fit in 16 bytes and the difference in 16 MB. It seems that the bit128
type bit128
redundant, but the standard type is redundant in this data set.
Source: https://habr.com/ru/post/442058/
All Articles