MVCC-4. Snapshots of data

Having considered the issues related to isolation , and having made a digression about organizing data at a low level , we last time talked in detail about the versatile lines and traced how the service information in the version header changes during various operations.

Today we will look at how consistent versions of data are obtained from string versions.

What is a snapshot of data

Physically, there may be several versions of the same line in the data pages. In addition, each transaction should see only one (or none) version of each row so that together they constitute an ACID-agreed data picture at a specific point in time.
')
Isolation in PostgreSQL is built on the basis of snapshots of data (snapshot): each transaction works with its own snapshot of data that “contains” data that was recorded before the snapshot was taken, and not “contains” data that has not yet been fixed. We have already seen that the insulation thus turns out to be more stringent than the standard requires, but not devoid of anomalies.

At the Read Committed isolation level, a snapshot is created at the beginning of each transaction statement. This snapshot is active while the statement is being executed. In the picture, the moment of the snapshot creation (which, as we remember, is determined by the transaction number) is shown in blue.

At the Repeatable Read and Serializable levels, a snapshot is created once at the beginning of the first transaction statement. This snapshot remains active until the very end of the transaction.

Visibility of row versions in snapshot

Visibility rules

Of course, the snapshot is not a physical copy of all the necessary versions of the lines. In fact, a snapshot is defined by several numbers, and the visibility of versions of strings in a snapshot is determined by the rules.

Whether or not this version of the row is visible in the snapshot depends on two fields of its header, xmin and xmax, that is, the numbers of the transaction that created and deleted. Such intervals do not overlap, so one line is represented in any snapshot with at most one version of it.

The exact rules of visibility are quite complex and take into account many different situations and extreme cases.

This is easily seen by looking at src / backend / utils / time / tqual.c (in version 12, the check moved to src / backend / access / heap / heapam_visibility.c).

Simplifying, we can say that the version of the line is visible when the changes made by the xmin transaction are visible in the snapshot and the changes made by the xmax transaction are not visible (in other words, it is already clear that the version of the line has appeared, but it is not yet clear that it has been deleted).

In turn, the transaction changes are visible in the snapshot, if either this is the same transaction that created the snapshot (she sees her own changes), or the transaction was fixed before the snapshot was taken.

You can depict transactions graphically in the form of segments (from the moment of beginning to the moment of fixation):

Here:

Changes to transaction 2 will be visible because it is completed before the snapshot is taken,
Changes to transaction 1 will not be visible because it was active at the time of the snapshot,
Changes to transaction 3 will not be visible, because it began after the snapshot was taken (no matter if it ended or not).

Unfortunately, the moment of committing transactions is unknown to the system. Only the moment of its beginning is known (it is determined by the number of the transaction and is marked in the figures above with a dotted line), but the fact of completion is not recorded anywhere.

All we can do is find out the current status of transactions when creating a snapshot. This information is in the shared memory of the server in the ProcArray structure, which contains a list of all active sessions and their transactions.

And after the fact, we will not be able to understand whether a transaction was active at the time of the snapshot creation or not. Therefore, a list of all current active transactions must be memorized in a snapshot.

It follows from the above that in PostgreSQL it is impossible to create a snapshot showing consistent data as of an arbitrary time ago, even if all the necessary versions of the rows exist in the table pages. One often hears the question why there are no retrospective (or temporal; in Oracle, this is called a flashback query) queries — this is one of the reasons.

It's funny that initially this functionality was, but later it was removed from the database. You can read about this in an article by Joseph Hellershtein .

So, a snapshot of the data is determined by several parameters:

the moment of the snapshot creation, namely, the number of the next transaction that does not yet exist in the system ( snapshot.xmax );
a list of active transactions at the time of the snapshot ( snapshot.xip ).

For convenience and optimization, the number of the earliest of the active transactions ( snapshot.xmin ) is stored separately. This value has an important meaning, which we will discuss below.

Also, several more parameters are saved in the snapshot, but they are not important for us.

Example

To see how visibility is determined by a snapshot, let's reproduce the three-transaction situation discussed above. The table will have three rows, and:

the first one is added by a transaction that started before the snapshot was taken, but was completed later,
the second one is added by a transaction that began and ended before the snapshot was taken,
the third was added after the snapshot was taken.

=> TRUNCATE TABLE accounts;

First transaction (not yet completed):

 => BEGIN; => INSERT INTO accounts VALUES (1, '1001', 'alice', 1000.00); => SELECT txid_current();

 => SELECT txid_current(); txid_current -------------- 3695 (1 row)

The second transaction (completed before creating the snapshot)

 | => BEGIN; | => INSERT INTO accounts VALUES (2, '2001', 'bob', 100.00); | => SELECT txid_current();

 | txid_current | -------------- | 3696 | (1 row)

 | => COMMIT;

We create a snapshot in a transaction in another session.

 || => BEGIN ISOLATION LEVEL REPEATABLE READ; || => SELECT xmin, xmax, * FROM accounts;

 || xmin | xmax | id | number | client | amount || ------+------+----+--------+--------+-------- || 3696 | 0 | 2 | 2001 | bob | 100.00 || (1 row)

Finish the first transaction after the snapshot is created:

 => COMMIT;

And the third transaction (appeared after the snapshot creation):

 | => BEGIN; | => INSERT INTO accounts VALUES (3, '2002', 'bob', 900.00); | => SELECT txid_current();

 | txid_current | -------------- | 3697 | (1 row)

 | => COMMIT;

Obviously, one line is still visible in our snapshot:

 || => SELECT xmin, xmax, * FROM accounts;

 || xmin | xmax | id | number | client | amount || ------+------+----+--------+--------+-------- || 3696 | 0 | 2 | 2001 | bob | 100.00 || (1 row)

The question is how PostgreSQL understands this.

Everything is determined by the snapshot. Let's look at it:

 || => SELECT txid_current_snapshot();

 || txid_current_snapshot || ----------------------- || 3695:3697:3695 || (1 row)

Here, the colon lists snapshot.xmin, snapshot.xmax, and snapshot.xip (in this case, one number, but in general - a list).

According to the rules formulated above, the snapshot should show the changes made by transactions with the numbers snapshot.xmin <= xid <snapshot.xmax, except for those listed in the snapshot.xip list. Let's look at all the rows in the table (in the new snapshot):

 => SELECT xmin, xmax, * FROM accounts ORDER BY id;

  xmin | xmax | id | number | client | amount ------+------+----+--------+--------+--------- 3695 | 0 | 1 | 1001 | alice | 1000.00 3696 | 0 | 2 | 2001 | bob | 100.00 3697 | 0 | 3 | 2002 | bob | 900.00 (3 rows)

The first line is not visible - it was created by a transaction that is included in the list of active (xip).
The second line is visible - it is created by a transaction that falls within the range of the snapshot.
The third line is not visible - it was created by a transaction that is not within the range of the snapshot.

 || => COMMIT;

Own changes

Somewhat complicates the picture is the case of determining the visibility of your own transaction changes. Here you may need to see only a part of such changes. For example, a cursor opened at a certain moment should not see changes made after that moment at any level of isolation.

To do this, there is a special field in the row header (which is displayed in the pseudo columns cmin and cmax), indicating the sequence number of the operation within the transaction. Cmin is the number to insert, cmax is to delete, but to save space in the row header, this is actually one field, not two different ones. It is considered that insertion and deletion of the same row in a single transaction is rarely performed.

If this does happen, then a special “combo” number is inserted into the same field, about which the serving process remembers the real cmin and cmax. But this is completely exotic.

A simple example. Start a transaction and add a row to the table:

 => BEGIN; => SELECT txid_current();

  txid_current -------------- 3698 (1 row)

 INSERT INTO accounts(id, number, client, amount) VALUES (4, 3001, 'charlie', 100.00);

Let's output the contents of the table along with the cmin field (but only for rows added by our transaction — for others it does not make sense):

 => SELECT xmin, CASE WHEN xmin = 3698 THEN cmin END cmin, * FROM accounts;

  xmin | cmin | id | number | client | amount ------+------+----+--------+---------+--------- 3695 | | 1 | 1001 | alice | 1000.00 3696 | | 2 | 2001 | bob | 100.00 3697 | | 3 | 2002 | bob | 900.00 3698 | 0 | 4 | 3001 | charlie | 100.00 (4 rows)

Now open the cursor for the query that returns the number of rows in the table.

 => DECLARE c CURSOR FOR SELECT count(*) FROM accounts;

And after that, add another line:

 => INSERT INTO accounts(id, number, client, amount) VALUES (5, 3002, 'charlie', 200.00);

The query returns 4 - the string added after the cursor is opened will not be included in the data snapshot:

 => FETCH c;

  count ------- 4 (1 row)

Why? Because in the snapshot only versions of lines with cmin <1 are taken into account.

 => SELECT xmin, CASE WHEN xmin = 3698 THEN cmin END cmin, * FROM accounts;

  xmin | cmin | id | number | client | amount ------+------+----+--------+---------+--------- 3695 | | 1 | 1001 | alice | 1000.00 3696 | | 2 | 2001 | bob | 100.00 3697 | | 3 | 2002 | bob | 900.00 3698 | 0 | 4 | 3001 | charlie | 100.00 3698 | 1 | 5 | 3002 | charlie | 200.00 (5 rows)

 => ROLLBACK;

Event horizon

The number of the earliest of the active transactions (snapshot.xmin) has an important meaning - it defines the “event horizon” of the transaction. Namely, beyond its horizon, a transaction always sees only current versions of strings.

Indeed, an irrelevant version is required to be seen only in the case when the current one is created by a transaction that has not yet ended, and therefore is not yet visible. But beyond the “horizon” all transactions are already guaranteed to be completed.

The “event horizon” of the transaction can be seen in the system directory:

 => BEGIN; => SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

  backend_xmin -------------- 3699 (1 row)

You can also define an "event horizon" at the database level. To do this, take all the active images and find the oldest xmin among them. It will determine the horizon beyond which irrelevant versions of rows in this database will never be seen any transaction. Such versions of the lines can be cleared - that is why the concept of the horizon is so important from a practical point of view.

If a transaction holds a snapshot for a long time, it will also hold the event horizon of the database. Moreover, an incomplete transaction will hold the horizon by the very fact of its existence, even if it does not hold a snapshot.

This means that irrelevant versions of rows in this database cannot be cleared. At the same time, a “long-running” transaction may not intersect with other transactions in any way - this is completely irrelevant, the database has one horizon for all.

If now, in the form of a segment, not to depict transactions, but snapshots (from snapshot.xmin to snapshot.xmax), then the situation can be imagined as follows:

In this figure, the lowest snapshot refers to an incomplete transaction, and in the remaining snapshots, snapshot.xmin cannot be greater than its number.

In our example, a transaction was started with the Read Committed isolation level. Even though there is no active snapshot of data in it, it continues to hold the horizon:

 | => BEGIN; | => UPDATE accounts SET amount = amount + 1.00; | => COMMIT;

 => SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

  backend_xmin -------------- 3699 (1 row)

And only after the completion of the transaction, the horizon moves forward, allowing you to clear out irrelevant versions of the lines:

 => COMMIT; => SELECT backend_xmin FROM pg_stat_activity WHERE pid = pg_backend_pid();

  backend_xmin -------------- 3700 (1 row)

If the described situation really creates problems and there is no way to avoid it at the application level, then, starting from version 9.6, two parameters are available:

old_snapshot_threshold determines the maximum lifetime of a snapshot. After this time, the server gets the right to delete irrelevant versions of strings, and if they are needed by a “long-playing” transaction, but it will receive a snapshot too old error.
idle_in_transaction_session_timeout determines the maximum lifetime of an inactive transaction. After this time, the transaction is aborted.

Export Snapshot

There are situations when several parallel transactions must be guaranteed to see the same data pattern. An example is the pg_dump utility, which can work in parallel mode: all workflows should see the database in the same state in order for the backup to be consistent.

Of course, one cannot rely on the fact that the pictures of the data will coincide simply because the transactions are running “at the same time”. For this there is a mechanism for exporting and importing a snapshot.

The pg_export_snapshot function returns a snapshot identifier that can be transferred (by means external to the DBMS) to another transaction.

 => BEGIN ISOLATION LEVEL REPEATABLE READ; => SELECT count(*) FROM accounts; --

  count ------- 3 (1 row)

 => SELECT pg_export_snapshot();

  pg_export_snapshot --------------------- 00000004-00000E7B-1 (1 row)

Another transaction can import a snapshot using the SET TRANSACTION SNAPSHOT command before executing the first query in it. You must first set the isolation level Repeatable Read or Serializable, because at the Read Committed level, operators will use their own snapshots.

 | => DELETE FROM accounts; | => BEGIN ISOLATION LEVEL REPEATABLE READ; | => SET TRANSACTION SNAPSHOT '00000004-00000E7B-1';

Now the second transaction will work with the snapshot of the first and, accordingly, see three lines (and not zero):

 | => SELECT count(*) FROM accounts;

 | count | ------- | 3 | (1 row)

The lifetime of the exported snapshot is the same as the lifetime of the export transaction.

 | => COMMIT; => COMMIT;

To be continued.

Source: https://habr.com/ru/post/446652/

All Articles