📜 ⬆️ ⬇️

Again about AUTO_INCREMENT

Everyone who works with databases knows what AUTO_INCREMENT is. About him a lot has been written, including in Habré . In this article I want to express my thoughts on this topic, because earlier I had not met any reasoning in this particular way. But first, let's decide why we need a database at all.

Indeed, we will keep everything in RAM. Although, no, it will not be enough. And in case of an unexpected shutdown, all data will be lost. We put the UPS. No, better than 2. Better yet, imagine that our memory is infinite and does not lose data when it is turned off.

Now you can write something like this:

Order order = new Order(); User user = new User(); order.creator = user; 

And to create at least 10 such orders, at least 20, at least 100 ... Hmm, I created a lot of things, it would be necessary to have a collection of some kind. And to her search in the fields. And the query language so that everything is universal. Plus indexes on these fields to speed up the search.
')
What have we got? It turned out almost the usual database. With one exception - objects have no primary key. All objects are identified by address in the address space.

Now you can remember that the RAM has its limitations, the programs are closed, the computers are turned off, data may be needed by several programs at once. So, you need to do some abstraction over RAM, so that everyone has the same address. Approximately as virtual addressing in protected processor mode. And store it in files on disk. And make a control system.

This leads to the idea why integer keys are easy to use. The reasons are not only in the implementation of database management systems. The database is the address space for placing objects. And the integer key (ID) is a reference to the object.

From this it follows that auto_increment within the same table is not entirely correct. Each entry in the database must have a unique address. It turns out like a two-dimensional address space - addresses grow in one direction, the objects themselves in another; the size of one object does not affect the addresses of neighboring objects. It does not matter if the database is distributed or not, how many servers, databases and tables are in it. This is one address space, and addressing must be unambiguous.

It turns out, theoretically, the record can be found by key without specifying a table; or vice versa, you can find the table itself, in which there is an entry with such a key. You can even break the address space into ranges: for example, up to 1,000,000 system tables, settings, directories; after 1,000,000 real data. In the event of overflow, you can add a predetermined sufficiently large constant to a range, or you can pre-configure a range map.

However, the ID does not need to be attributed to the attributes of the object itself as an element of the data model. For example:

 int x = 2; 

The variable x has the value 2 and the address 0x123456. But it cannot be said that the address is an attribute of integer values. It is also impossible to say that ID is an attribute of objects of type User and Order. It serves simply to connect the abstract model and the technical implementation.

All the limitations of the natural key must be done with additional technical means, at the level of the database itself or the application. In fact, I cannot even imagine a situation in which the natural primary key is “natural.” All options such as passport, telephone number, TIN - this is an artificially entered numbering of objects, precisely because it is impossible to allocate from them or their owners a unique set of features. And it can be used only for the identification of precisely these objects - issued passports, telephone network nodes, an economic entity paying taxes. In reality, there is always the possibility of the existence of duplicates with the same properties, and a set of different properties at different times in one entity.

PS: My reasoning is theoretical, I did not work with databases with a similar structure. Technically, this is easy to do - one sequence for all tables. If someone has this experience, please share information in the comments - pros, cons, pitfalls.

Source: https://habr.com/ru/post/255645/


All Articles