InterSystems IRIS Global Transactions

InterSystems IRIS DBMS supports curious data storage structures - globals. In fact, these are multi-level keys with various additional benefits in the form of transactions, quick functions for traversing data trees, locks and their own ObjectScript language.

More about globals in the series of articles “Globals - storage swords-treasures”:

The trees. Part 1
The trees. Part 2
Sparse arrays. Part 3
')
It became interesting to me how transactions are implemented in the global, what features are there. After all, this is a completely different structure for storing data than the usual tables. Much lower level.

As you know from relational database theory, a good transaction implementation must satisfy ACID requirements:

A - Atomic (atomicity). All changes made to the transaction or none at all are recorded.

C - Consistency. After the transaction is completed, the logical state of the database must be internally consistent. In many ways, this requirement applies to the programmer, but in the case of SQL databases, it also applies to foreign keys.

I - Isolate (isolation). Parallel transactions should not affect each other.

D - Durable. After the transaction is successfully completed, problems at the lower levels (power failure, for example) should not affect the data changed by the transaction.

Globals are non-relational data structures. They were created for ultrafast work on very limited hardware. Let's understand the implementation of transactions in globals using the official IRIS docker image .

To support transactions in IRIS, the following commands are used: TSTART , TCOMMIT , TROLLBACK .

1. Atomicity

The easiest way to check atomicity. Checking from the database console.

Kill ^a TSTART Set ^a(1) = 1 Set ^a(2) = 2 Set ^a(3) = 3 TCOMMIT

Then we conclude:

 Write ^a(1), “ ”, ^a(2), “ ”, ^a(3)

We get:

 1 2 3

Everything is good. Atomicity observed: all changes are recorded.

We complicate the task, introduce an error and see how the transaction is saved, partially or not at all.

Let's check atomicity once again:

 Kill ^A TSTART Set ^a(1) = 1 Set ^a(2) = 2 Set ^a(3) = 3

Then forcibly stop the container, start and see.

 docker kill my-iris

This command is almost equivalent to forcibly turning off the power, as it sends a signal to immediately stop the SIGKILL process.

Maybe the transaction was partially saved?

 WRITE ^a(1), ^a(2), ^a(3) ^ <UNDEFINED> ^a(1)

- No, not preserved.

Test the rollback command:

 Kill ^A TSTART Set ^a(1) = 1 Set ^a(2) = 2 Set ^a(3) = 3 TROLLBACK WRITE ^a(1), ^a(2), ^a(3) ^ <UNDEFINED> ^a(1)

Nothing was preserved either.

2. Consistency

Since in databases on globals, keys are also made on globals (I recall that a global is a lower-level structure for storing data than a relational table), to fulfill the consistency requirement, you need to include the key change in the same transaction as the global change.

For example, we have a global ^ person in which we store personalities and we use the TIN as a key.

 ^person(1234567, 'firstname') = 'Sergey' ^person(1234567, 'lastname') = 'Kamenev' ^person(1234567, 'phone') = '+74995555555 ...

In order to have a quick search by last name and first name, we made the key ^ index.

 ^index('Kamenev', 'Sergey', 1234567) = 1

In order for the base to be agreed, we must add personalities like this:

 TSTART ^person(1234567, 'firstname') = 'Sergey' ^person(1234567, 'lastname') = 'Kamenev' ^person(1234567, 'phone') = '+74995555555 ^index('Kamenev', 'Sergey', 1234567) = 1 TCOMMIT

Accordingly, when deleting, we should also use the transaction:

 TSTART Kill ^person(1234567) ZKill ^index('Kamenev', 'Sergey', 1234567) TCOMMIT

In other words, the fulfillment of the consistency requirement rests entirely with the programmer. But when it comes to globals, this is normal, due to their low-level nature.

3. Isolation

This is where the wilds begin. Many users simultaneously work on the same database, modify the same data.

The situation is comparable to the situation when many users simultaneously work with the same repository with the code and try to commit changes to many files at once in it.

The database should resolve this in real time. Given that in serious companies there is even a special person who is responsible for version control (for merging branches, resolving conflicts, etc.), and the database should do all this in real time, it becomes apparent the complexity of the task and the correct design of the database and the code that serves it.

The database cannot understand the meaning of the actions performed by users in order to prevent conflicts if they work on the same data. It can only cancel one transaction contrary to another or execute them sequentially.

Another problem is that during the execution of the transaction (before the commit), the state of the database may be inconsistent, so it is desirable that other transactions do not have access to the inconsistent state of the database, which is achieved in relational databases in many ways: creating snapshots, multiversion of rows and etc.

In parallel execution of transactions, it is important for us that they do not interfere with each other. This is the property of isolation.

SQL defines 4 isolation levels:

READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE

Let's consider each level separately. The costs of implementing each level are growing almost exponentially.

READ UNCOMMITTED is the lowest level of isolation, but the fastest. Transactions can read changes made by each other.

READ COMMITTED is the next level of isolation, which is a compromise. Transactions cannot read changes made by each other before a commit, but can read any changes made after a commit.

If we have a long transaction T1, during which commits took place in transactions T2, T3 ... Tn, which worked with the same data as T1, then when we request data in T1, we will get different results each time. This phenomenon is called non-repeatable reading.

REPEATABLE READ - in this isolation level, we do not have the phenomenon of non-repeatable reading, due to the fact that for each request to read data, a snapshot of the result data is created and when reused in the same transaction, the data from the snapshot is used. However, at this isolation level, phantom data can be read. This refers to reading new lines that were added by concurrent committed transactions.

SERIALIZABLE - The highest level of isolation. It is characterized by the fact that the data used in any way in a transaction (reading or changing) becomes available to other transactions only after the completion of the first transaction.

First, let’s figure out whether there is isolation of operations in a transaction from the main thread. Let's open 2 terminal windows.

 Kill ^t Write ^t(1) 2

 TSTART Set ^t(1)=2

There is no isolation. One thread sees what the second who opened the transaction does.

Let's see if transactions of different flows see what is happening inside them.

Let's open 2 terminal windows and open 2 transactions in parallel.

 kill ^t TSTART Write ^t(1) 3

 TSTART Set ^t(1)=3

Concurrent transactions see each other's data. So, we got the simplest, but also the fastest isolation level READ UNCOMMITED.

In principle, this could be expected for globals, for whom speed has always been paramount.

But what if we need a higher level of isolation in global operations?

Here you need to think about why insulation levels are needed and how they work.

The highest isolation level of SERIALIZE means that the result of concurrently executed transactions is equivalent to their sequential execution, which guarantees the absence of collisions.

We can do this with the help of competent locks in ObjectScript, which have a lot of different ways of application: you can do regular, incremental, multiple locks with the LOCK command.

Lower isolation levels are trade-offs designed to increase the speed of the database.

Let's see how we can achieve different levels of isolation using locks.

This operator allows you to take not only the exclusive locks needed to change data, but also the so-called shared ones, which can take several threads at once, when they need to read data that should not be changed by other processes during reading.

More information about the two-phase locking method in Russian and English:

→ Two phase lock
→ Two-phase locking

The difficulty is that during the transaction the state of the database may be inconsistent, however, this inconsistent data is visible to other processes. How to avoid this?

Using locks, we will make such windows of visibility in which the state of the database will be agreed. And all calls to such windows of visibility of the agreed state will be controlled by locks.

Shared locks of the same data are reusable - several processes can take them. These locks prevent other processes from changing data, i.e. they are used to form windows of a consistent database state.

Exclusive locks are used to modify data - only one process can take such a lock. Exclusive blocking can take:

Any process if data is free
Only the process that has a shared lock on this data and the first one requested an exclusive lock.

The narrower the visibility window, the longer other processes have to wait for it, but the more consistent the state of the database in it can be.

READ_COMMITED - the essence of this level is that we only see data from other streams that are locked. If the data in another transaction is not commited yet, then we see their old version.

This allows us to parallelize the work instead of waiting for the lock to be released.

Without special tricks, we will not be able to see the old version of the data in IRIS, so we will have to do with locks.

Accordingly, we will have to use shared locks to allow reading of data only at moments of consistency.

Suppose we have a user base ^ person who transfer money to each other.

The moment of transfer from person 123 to person 242:

 LOCK +^person(123), +^person(242) Set ^person(123, amount) = ^person(123, amount) - amount Set ^person(242, amount) = ^person(242, amount) + amount LOCK -^person(123), -^person(242)

The moment of requesting the amount of money from person 123 before debiting must be accompanied by an exclusive lock (by default):

 LOCK +^person(123) Write ^person(123)

And if you need to show the account status in your account, you can use shared lock or not use it at all:

 LOCK +^person(123)#”S” Write ^person(123)

However, if we assume that database operations are performed almost instantly (I recall that globals are a much lower level structure than a relational table), then the need for this level drops.

REPEATABLE READ - In this isolation level, it is assumed that there can be multiple reads of data that can be modified by concurrent transactions.

Accordingly, we will have to put a shared lock on reading the data that we are changing and exclusive locks on the data that we are changing.

Fortunately, the LOCK operator allows one operator to list in detail all the necessary locks, of which there can be a lot.

 LOCK +^person(123, amount)#”S”  ^person(123, amount)

other operations (at this time, parallel threads try to change ^ person (123, amount), but cannot)

 LOCK +^person(123, amount)  ^person(123, amount) LOCK -^person(123, amount)  ^person(123, amount) LOCK -^person(123, amount)#”S”

When listing locks separated by commas, they are taken sequentially, and if you do so:

 LOCK +(^person(123),^person(242))

then they are taken atomically all at once.

SERIALIZE - we will have to set the locks so that ultimately all transactions that have common data are executed sequentially. For this approach, most locks should be exclusive and taken to the smallest areas of the global for performance.

If we talk about write-offs in the global ^ person, then only the SERIALIZE isolation level is acceptable for him, since money must be spent strictly sequentially, otherwise it is possible to spend the same amount several times.

4. Durability

I conducted tests with hard cutting down the container through

 docker kill my-iris

The base tolerated them well. No problems were identified.

Conclusion

For globals, InterSystems IRIS has transaction support. They are truly atomic, reliable. To ensure database consistency on globals, the programmer’s efforts and the use of transactions are necessary, since there are no complex built-in constructions such as foreign keys.

The isolation level for globals without the use of locks is READ UNCOMMITED, and when using locks, it can be ensured up to the SERIALIZE level.

The correctness and speed of transactions on globals very much depends on the skill of the programmer: the more widely used shared locks when reading, the higher the isolation level, and the more exclusive locks are taken, the greater the speed.

Source: https://habr.com/ru/post/461753/

All Articles