📜 ⬆️ ⬇️

Extensible Postgres


At the last PGConf.Russia there was a report about the expansion of MobilityDB , and Andrei Borodin proposed the idea of expanding the methods of the indices for the task.


I will continue the theme with the Postgres extension for the problem being solved on the example of the extension made within the HighLoad Cup 2018, the code is available on GithHub . On Habré there is already an article from blackmaster . The extension adds two types with support for btree and GIN indexes.


Initial scheme:


CREATE TABLE accounts ( id INTEGER PRIMARY KEY, email VARCHAR(100) UNIQUE, fname SMALLINT, sname VARCHAR(50), phone VARCHAR(16) UNIQUE, birth TIMESTAMP NOT NULL, country SMALLINT, city SMALLINT, email_domain SMALLINT, joined TIMESTAMP NOT NULL, status status, sex sex NOT NULL, interests SMALLINT[], premium tsrange, likes_ids INTEGER[], likes_tss INTEGER[], CHECK ( joined >= '01.01.2011'::timestamp ), CHECK ( joined <= '01.01.2018'::timestamp ) ); CREATE INDEX IF NOT EXISTS interests_ids_index ON accounts USING GIN(interests); 

The number of entries is 1,300,000 , and the dimensions of the relationship of interest are:


relationsize
accounts598 MB
accounts_email_key54 MB
accounts_phone_key32 MB
accounts_pkey28 MB
interests_ids_index8072 kB

The final scheme:


 CREATE UNLOGGED TABLE accounts ( id INTEGER PRIMARY KEY, email VARCHAR(100) UNIQUE, fname SMALLINT, sname VARCHAR(50), phone phone UNIQUE, birth TIMESTAMP NOT NULL, country SMALLINT, city SMALLINT, email_domain SMALLINT, joined TIMESTAMP NOT NULL, status status, sex sex NOT NULL, interests bit128, premium tsrange, likes_ids INTEGER[], likes_tss INTEGER[], CHECK ( joined >= '01.01.2011'::timestamp ), CHECK ( joined <= '01.01.2018'::timestamp ) ); 

Dimensions:


relationold sizenew size
accounts598 MB579 MB
accounts_phone_key32 MB28 MB
accounts_pkey28 MB28 MB
interests_ids_index8072 kB8072 kB

Two types have been added: phone and bit128 .


phone


The phone number looks like 8 (929) 5481819 , the eight can be discarded. Operator code fits into 10 bits, the number is 24 bits, it turns out you need 5 bytes. Not very convenient number due to the alignment of data in memory. The question of how best to lay data in 5, 6 or 8 bytes, the answer can only give tests.


If you follow the example of the documentation, you need a little bit. Determine the type:


 class Phone : public PAllocNew<Phone> { public: bool fromCString(char* str) noexcept; char* toCString() const noexcept; int code() const noexcept; bool operator==(const Phone& b) const noexcept; bool operator<(const Phone& b) const noexcept; bool operator<=(const Phone& b) const noexcept; bool operator>(const Phone& b) const noexcept; bool operator>=(const Phone& b) const noexcept; private: uint64_t m_data = 0; }; #define GETARG_PHONE(x) reinterpret_cast<Phone*>(PG_GETARG_POINTER(x)) 

PAllocNew helper class overloading new :


 template<typename T> class PAllocNew { public: static void* operator new(std::size_t count, const std::nothrow_t& tag) { return palloc(count); } }; 

The memory allocated through palloc is released when the transaction is completed. Add I / O functions:


 Datum phone_in(PG_FUNCTION_ARGS) { char* str = PG_GETARG_CSTRING(0); auto v = new(std::nothrow) Phone; if (!v->fromCString(str)) { ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg( "invalid input syntax for phone: \"%s\"", str ))); } PG_RETURN_POINTER(v); } Datum phone_out(PG_FUNCTION_ARGS) { const auto phone = GETARG_PHONE(0); PG_RETURN_CSTRING(phone->toCString()); } 

And type registration:


 CREATE TYPE phone; CREATE OR REPLACE FUNCTION phone_in ( cstring ) RETURNS phone AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION phone_out ( phone ) RETURNS cstring AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE TYPE phone ( INTERNALLENGTH = 8, INPUT = phone_in, OUTPUT = phone_out, ALIGNMENT = int4, COLLATE TRUE ); 

Since A new type has a size of no more than 8 bytes, then it can be transmitted not by reference, but by value. In this case, in the CREATE TYPE you need to specify the PASSEDBYVALUE flag. Or use LIKE :


 CREATE TYPE phone ( INPUT = phone_in, OUTPUT = phone_out, LIKE = int8, COLLATE TRUE ); 

Then INTERNALLENGTH , ALIGNMENT and PASSEDBYVALUE inherit from int8.


btree index


The uniqueness of the field is supported by creating a unique B-tree index. This is done through the classes of operators and strategies .


Operatorroom
lessone
less or equal2
equally3
more or equalfour
morefive

Functionroom
equallyone

 CREATE OPERATOR = ( PROCEDURE = phone_equal, LEFTARG = phone, RIGHTARG = phone, COMMUTATOR = =, NEGATOR = != ); CREATE OPERATOR < ( PROCEDURE = phone_lt, LEFTARG = phone, RIGHTARG = phone, NEGATOR = > ); CREATE OPERATOR <= ( PROCEDURE = phone_le, LEFTARG = phone, RIGHTARG = phone ); CREATE OPERATOR >= ( PROCEDURE = phone_ge, LEFTARG = phone, RIGHTARG = phone ); CREATE OPERATOR > ( PROCEDURE = phone_gt, LEFTARG = phone, RIGHTARG = phone, NEGATOR = < ); CREATE OPERATOR CLASS btree_phone_ops DEFAULT FOR TYPE phone USING btree AS OPERATOR 1 <, OPERATOR 2 <=, OPERATOR 3 =, OPERATOR 4 >=, OPERATOR 5 >, FUNCTION 1 phone_equal_internal ( phone, phone ); 

Operators return bool , and phone_equal_internal int:


 Datum phone_equal_internal(PG_FUNCTION_ARGS) { const auto a = GETARG_PHONE(0); const auto b = GETARG_PHONE(1); if (*a < *b) { PG_RETURN_INT32(-1); } if (*a > *b) { PG_RETURN_INT32(1); } PG_RETURN_INT32(0); } 

All this gave a slight decrease in data:


relationsizediff
accounts595 MB3 MB
accounts_phone_key28 MB4 MB

The number of accounts with numbers 533 289, which is 41%. At least there is no string comparison when working with an index.


bit128


I wanted to have a bitfield with intersection (&&), occurrences (<@) and GIN support. It was enough 96 bit, but went on a simple path and took uint128 .


 class BitSet128: public PAllocNew<BitSet128> { public: bool haveIntersection(const BitSet128& b) const noexcept; bool contains(const BitSet128& b) const noexcept; bool operator==(const BitSet128& b) const noexcept; bool operator<(const BitSet128& b) const noexcept; bool operator>(const BitSet128& b) const noexcept; bool fromCString(char* str) noexcept; char* toCString() const noexcept; std::vector<uint8_t> keys() const; private: uint128 m_bits = 0; }; #define GETARG_BIT128(x) reinterpret_cast<BitSet128*>(PG_GETARG_POINTER(x)) 

Type registration and operation:


 CREATE TYPE bit128 ( INTERNALLENGTH = 16, INPUT = bit128_in, OUTPUT = bit128_out, ALIGNMENT = int4 ); CREATE OPERATOR = ( PROCEDURE = bit128_equal, LEFTARG = bit128, RIGHTARG = bit128, COMMUTATOR = =, NEGATOR = != ); CREATE OPERATOR && ( PROCEDURE = bit128_intersection, LEFTARG = bit128, RIGHTARG = bit128 ); CREATE OPERATOR @> ( PROCEDURE = bit128_containts, LEFTARG = bit128, RIGHTARG = bit128 ); 

Generalized Inverted Index / GIN


About GIN in Postgres was well written by Egor Rogov erogov in the article Indices in PostgreSQL - 7 , applying to the full-text search task. The GIN extensibility documentation says that 4 functions are sufficient:


Extracting an array of keys from an indexed object:


 Datum *extractValue(Datum itemValue, int32 *nkeys, bool **nullFlags) 

Retrieves keys for the requested value:


 Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n, bool **pmatch, Pointer **extra_data, bool **nullFlags, int32 *searchMode) 

For example in the query:


 SELECT * FROM test WHERE a && ARRAY[1, 2, 3] 

extractQuery is applied to the ARRAY[1, 2, 3] array ARRAY[1, 2, 3] , and the keys can be 1, 2, 3.
Checks if the value satisfies the query:


 bool consistent(bool check[], StrategyNumber n, Datum query, int32 nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool nullFlags[]) 

Since the extracted keys are stored in the search tree, the comparison function is needed:


 int compare(Datum a, Datum b) 

The API may seem redundant, but it provides everything that might come in handy. Let's turn to implementation. Extract keys:


 Datum gin_extract_value_bit128(PG_FUNCTION_ARGS) { auto bitset = GETARG_BIT128(0); const auto bits = bitset->keys(); int32* nentries = (int32*) PG_GETARG_POINTER(1); *nentries = bits.size(); Datum* entries = NULL; entries = (Datum*) palloc(sizeof(Datum) * bits.size()); for (int i = 0; i < bits.size(); ++i) { entries[i] = DatumGetInt16(bits[i]); } PG_RETURN_POINTER(entries); } 

In the output parameter nentries write the number of keys and return an array of keys of the entries . Key comparison:


 Datum bit128_cmp(PG_FUNCTION_ARGS) { const auto a = PG_GETARG_INT16(0); const auto b = PG_GETARG_INT16(1); if (a < b) { PG_RETURN_INT32(-1); } if (a > b) { PG_RETURN_INT32(1); } PG_RETURN_INT32(0); } 

These two functions are enough to create an index. The remaining functions are used when searching by index. Extracting keys from the request:


 Datum gin_extract_query_bit128(PG_FUNCTION_ARGS) { const auto a = GETARG_BIT128(0); int32* nentries = (int32*) PG_GETARG_POINTER(1); StrategyNumber strategy = PG_GETARG_UINT16(2); int32* searchMode = (int32*) PG_GETARG_POINTER(6); Datum* res = nullptr; const auto keys = a->keys(); *nentries = keys.size(); res = (Datum*) palloc(sizeof(Datum) * keys.size()); for (int i = 0; i < keys.size(); ++i) { res[i] = DatumGetInt16(keys[i]); } switch (strategy) { case RTOverlapStrategyNumber: // && if (*nentries > 0) { *searchMode = GIN_SEARCH_MODE_DEFAULT; } else { *searchMode = GIN_SEARCH_MODE_ALL; } break; case RTContainsStrategyNumber: // @> if (*nentries > 0) { *searchMode = GIN_SEARCH_MODE_DEFAULT; } else { *searchMode = GIN_SEARCH_MODE_ALL; } break; } PG_RETURN_POINTER(res); } 

The first parameter is the value to the right of the operator, the third parameter is responsible for the strategy (operator) by which the search is performed. About the modes of poska better read in the documentation . Compliance function:


 Datum gin_bit128_consistent(PG_FUNCTION_ARGS) { bool* check = (bool*) PG_GETARG_POINTER(0); StrategyNumber strategy = PG_GETARG_UINT16(1); int32 nkeys = PG_GETARG_INT32(3); bool* recheck = (bool*) PG_GETARG_POINTER(5); bool res = true; switch (strategy) { case RTOverlapStrategyNumber: // && { for (int i = 0; i < nkeys; ++i) { if (check[i]) { res = true; } }; *recheck = false; } break; case RTContainsStrategyNumber: // @> { for (int i = 0; i < nkeys; ++i) { if (check[i]) { res = true; } }; *recheck = true; } break; } PG_RETURN_BOOL(res); } 

Returns true if the indexed object satisfies the query operator with the strategy number. The check array has a length of nkeys , which corresponds to the number of keys returned by gin_extract_query_bit128 . The check[i] = true means that the i-th key from gin_extract_query_bit128 is present in the indexed object. This is enough for intersection, but because in GIN we do not work with the values ​​themselves, then for the entry strategy we set recheck to true. This will call the appropriate operator to check the result. Operator class:


 CREATE OR REPLACE FUNCTION gin_extract_value_bit128 ( bit128, internal, internal) RETURNS internal AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION gin_extract_query_bit128(bit128, internal, int2, internal, internal) RETURNS internal AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OR REPLACE FUNCTION gin_bit128_consistent(internal, int2, anyelement, int4, internal, internal) RETURNS bool AS 'libhighloadcup' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; CREATE OPERATOR CLASS bit128_ops DEFAULT FOR TYPE bit128 USING gin AS OPERATOR 3 &&, OPERATOR 6 =, OPERATOR 7 @>, FUNCTION 1 bit128_cmp (int2, int2 ), FUNCTION 2 gin_extract_value_bit128(bit128, internal, internal), FUNCTION 3 gin_extract_query_bit128(bit128, internal, int2, internal, internal), FUNCTION 4 gin_bit128_consistent(internal, int2, anyelement, int4, internal, internal), STORAGE int2; 

Added STORAGE , it indicates what type of data is used for keys. What is the result was on the disk:


relationold sizenew sizediff
accounts595 MB579 MB16 MB
interests_ids_index8072 kB8072 kB

An interesting result, because there are nuances:



All created types have a fixed size, so PLAIN storage is selected for them. They do not apply compression and separate storage. Arrays and strings as variable length types pass through TOAST. Yes, there is an overhead of storage size, but there is also compression.


The distribution of interests is as follows:


Number of interestsusers
one174061
2279744
3317212
four262313
five128512
648099
712228
eight2232
9250
null75349

Honestly, I don’t know how SMALLINT[] released, but suppose 4 bytes for size and 2 bytes for value. Then in 16 bytes you can fit an array of 6 elements. And 98% of the elements would fit in 16 bytes and the difference in 16 MB. It seems that the bit128 type bit128 redundant, but the standard type is redundant in this data set.


')

Source: https://habr.com/ru/post/442058/


All Articles