📜 ⬆️ ⬇️

PostgreSQL: Unique keys for a distributed database. Practice

In the wake of the article "Unique key in a distributed database . "

We have a base that we want to share. In the ideal case, I want to make a master-master. One of the most difficult moments is ensuring the uniqueness of the keys on all servers. And it is good if the base was originally designed with regard to scaling ... Again, this is something from the domain of the ideal that occurs, let's say, not often.

So, we have a base that needs to be prepared for master-master synchronization - we will make all the keys in our database unique within the project.
')
The mentioned article considered several options, but we will focus on one suggested Instagram

Step 1 - Translation of all keys in bigint


This implies that all our primary keys are called id, and accordingly the fields that refer to these keys are named like this: order_id, client_id, table_id ...

Create a function that converts the integer field to bigint
DROP FUNCTION IF EXISTS "field_int2big" (field text, tablename text, table_schema text); CREATE OR REPLACE FUNCTION "field_int2big" (field text, tablename text, table_schema text) RETURNS bigint AS $body$ DECLARE BEGIN EXECUTE 'ALTER TABLE '|| table_schema || '."'|| tablename || '" ALTER COLUMN "'|| field || '" TYPE bigint;' ; return 1; END; $body$ LANGUAGE 'plpgsql'; 


Then we select all integer fields and convert them:
 select *, field_int2big(column_name, table_name, table_schema) from (select table_catalog, table_schema, table_name, column_name, data_type from information_schema.columns where table_schema in ('public', 'myscheme') and data_type in ('integer', 'oid') and (position('id' in column_name)>0 OR column_name in ('key', 'myfield')) order by table_catalog, table_schema, table_name, column_name limit 10 offset 0) c 

A few things to look out for:
  1. You can / should add your schemes: table_schema in ('public', 'myscheme')
  2. You can also add your own fields named not “standard”: column_name in ('key', 'myfield')
  3. Pay attention to limit 10 for large bases of tables. You need to reduce it up to 1 - changing the type takes time and is not small
  4. The query must be run several times, each time it will find the remaining non-translated fields.


Step 2 - Translation of functional indexes in which there is a direct type indication


In general, if this is not done, it will bring problems in the future, they are extremely difficult to detect: it gives an error that is not visible in the executable query.

 DROP FUNCTION IF EXISTS "index_int2big" (idx text, declare_idx text); CREATE OR REPLACE FUNCTION "index_int2big" (idx text, declare_idx text) RETURNS text AS $body$ DECLARE new_idx text; BEGIN EXECUTE 'DROP INDEX IF EXISTS ' || idx; SELECT replace(declare_idx, 'integer', 'bigint') INTO new_idx; EXECUTE new_idx ; return new_idx; END; $body$ LANGUAGE 'plpgsql'; select *, index_int2big(indname, inddef) from (SELECT n.nspname as table_schema, c.relname as table_name, c2.relname AS indname, i.indisprimary, i.indisunique, i.indisclustered, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) AS inddef FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i, pg_namespace n WHERE n.oid=c.relnamespace and c.oid = i.indrelid AND i.indexrelid = c2.oid and n.nspname in ('bucardo', 'public') and position('integer' in pg_catalog.pg_get_indexdef(i.indexrelid, 0, true))>0 limit 10 offset 0) c 

You may notice that I check in the bucardo scheme. If you already have synchronization based on this technology, then this step becomes extremely important. See also note from last step.

Step 3 - Creating New Sequences for All Key Fields


In the proposed Instagram version uses a unique number for each scheme / server.
those. it is necessary to have in each scheme its own function with its unique number.
I slightly changed the function and generate a unique key by the server IP.

 CREATE OR REPLACE FUNCTION inet2num(inet) RETURNS numeric AS $$ DECLARE a text[] := string_to_array(host($1), '.'); BEGIN RETURN a[1]::numeric * 16777216 + a[2]::numeric * 65536 + a[3]::numeric * 256 + a[4]::numeric; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; DROP FUNCTION IF EXISTS next_id(tbl text, tableschema text); CREATE OR REPLACE FUNCTION next_id(tbl text, tableschema text = 'public') returns bigint AS $$ DECLARE our_epoch bigint := 1314220021721; seq_id bigint; now_millis bigint; shard_id bigint; result bigint; BEGIN SELECT nextval(tableschema||'."' || tbl || '_id_seq"') % 1024 INTO seq_id; /* select substring(regexp_replace(md5(current_database()||inet_server_addr()||version()), '[^\\\d]+', '', 'g')::text from 1 for 6)::int into shard_id;*/ SELECT inet2num(inet_server_addr()) into shard_id; SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis; result := (now_millis - our_epoch) << 23; result := result | (shard_id << 10); result := result | (seq_id); RETURN result; END; $$ LANGUAGE PLPGSQL; 


Now we have to call it on all our identifiers:
 DROP FUNCTION IF EXISTS "reset_nextid" (tablename text, tableschema text); CREATE OR REPLACE FUNCTION "reset_nextid" (tablename text, tableschema text) RETURNS bigint AS $body$ DECLARE id_type text; BEGIN SELECT data_type from information_schema.columns c where "table_schema"=tableschema and "table_name"=tablename and column_name='id' INTO id_type; IF id_type <> 'bigint' THEN EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id TYPE bigint;' ; END IF; EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT next_id('''|| tablename || ''', '''|| tableschema || ''');'; return next_id(tablename, tableschema); END; $body$ LANGUAGE 'plpgsql'; select t.*, reset_nextid(table_name, table_schema) from ( select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id') where c.column_name is not null and position('next_id' in c.column_default)<>1 and s.sequence_schema=t.table_schema and t.table_schema in ('public', 'acc') order by t.table_schema, t.table_name limit 50 offset 0 ) as t 


Because we have already changed the type of identifier to bigint - the request should work quickly.

We have completed the training on this, our base is working and is ready to work in parallel.

Bonus

If something went wrong with next_id, you can return to the standard sequences:
 CREATE OR REPLACE FUNCTION "restore_nextval" (tablename text, tableschema text = 'public') RETURNS bigint AS $body$ DECLARE BEGIN EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT nextval('''|| tablename || '_id_seq''::regclass);'; return nextval((tableschema || '."'|| tablename || '_id_seq"')::regclass); END; $body$ LANGUAGE 'plpgsql'; select t.*, restore_nextval(table_name, table_schema) from ( select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id') where c.column_name is not null and position('next_id' in c.column_default)>0 and s.sequence_schema=t.table_schema and t.table_schema in ('public', 'acc') 


Thank. I hope someone was helpful.

PS Do not try to do it on a 32 bit server. Update server first. And the application server too.

Source: https://habr.com/ru/post/136852/


All Articles