📜 ⬆️ ⬇️

Unique key in a distributed database

In case you are sharing data across multiple physical databases,
support for globally unique identifiers is not such a trivial task.
I tried to put together possible options and consider their pros and cons.


What requirements (except for uniqueness) can be placed on keys?



The key should monotonously increase
Why do you need it:

The above is true only for indexes using trees.
')
The key must be generated on the client side
Why do you need it:


Limit key size (for example, only 64 bits)
Why do you need it:


What are the ways to generate unique keys?



ID generation on the application side (UUID, etc.)

With UUID, everything is relatively simple - we take the library and generate keys on the application side. UUID is widely used in many systems. The value is formed in such a way that collisions are impossible.

Minuses:
• The length of the standard UUID is 128 bits (which, in fact, you often want to store directly as a string of hexadecimal digits, which is already 32 bytes)
• As a rule, the UUID does not provide a natural key sorting.

Separate service for key generation

This option is used, for example, Flick and Twitter .

Minuses:
• complication of the system due to the introduction of additional components
• a potentially single point of failure, or additional efforts are required to ensure high availability of the key generator

Auto-increment on database side by value range

The whole range of values ​​is divided into subranges and each shard (cluster node) is responsible for its own range, for example:
     1 - 0 .. 3FFFFFFF,
     2 - 40000000 .. 7FFFFFFF,
     3 - 80000000 .. BFFFFFFF,
     4 - C0000000 .. FFFFFFFF

Minuses:
• DB must support auto-increment (not applicable to many NoSQL-storages)
natural sorting is impossible
• The client "does not know" the key before inserting the object. To find out the key you need to make a separate request.
• It is almost impossible to increase the number of cluster nodes.

Instagram autoincrement

This decision is described in the technical blog of the Instgram team (photo sharing application).

They use a 64-bit key that is generated on the DB (PostgreSQL) side and consists of bit fields.

63 31 0 |======|======|======|======|======|======|======|======| | CUSTOM TIMESTAMP | SHARD ID | AUTO | 

Where
- CUSTOM TIMESTAMP (41 bits) - time in milliseconds from 2011-01-01 00:00:00
- SHARD ID (13 bits) - logical partition identifier (the number of shards is greater than the number of physical nodes)
- AUTO (10 bits) - sequence (sequence), unique within the logical partition (shard)

Pluses (compared to autoincrement by ranges):
• Automatic chronological sorting

And what options do you know and use?

Source: https://habr.com/ru/post/135364/


All Articles