PostgreSQL: Case in Vacuum

One of our customers, who exploited PostgreSQL under heavy load, encountered a problem with the overflow of the transaction counter (xid wraparound), and there was no way out of it by regular means. We solved the problem with the help of a surgical intervention and released a patch to prevent such situations from occurring in the future.

In this post we will explain how and why a problem can occur and how to prevent it.

PostgreSQL Transaction Counter Device

One of the most attractive features of PostgreSQL is the ability to work in conditions of high competition for data: reading transactions do not block writing and vice versa. All this is due to the multi-versioning mechanism (MVCC). The implementation is based on the fact that each transaction in PostgreSQL has its own number (identifier), called xid. The numbers are increasing all the time, so that a transaction with a smaller number is considered to have begun earlier, and a transaction with a larger number — later. Each line in the table has, among other things, two additional system fields that are not shown in user requests: they are called xmin and xmax. The xmin field stores the number of the transaction that created this line, and xmax - the number of the transaction that deleted it (if, of course, this happened). Thus, each line can have several versions with a different scope. This approach to organizing data storage is called versioned.

PostgreSQL never changes user contents of a string, only system fields change. Updating data (UPDATE) marks the row as deleted, that is, sets xmax = xid_current, and creates a new copy of the row with updated content, which has xmin = xid_current.

When PostgreSQL reads data from tables, it always happens in the context of some snapshot. When you take a snapshot of the data, it stores the current transaction number, according to which, from several versions of the row, you can choose the one that will be visible in the context of the current snapshot. In addition, the snapshot includes a list of all transactions that are not completed at the moment, since changes made by such transactions should not fall into the snapshot.

String versions that are out of scope for all active transactions are no longer needed. To keep the database from overgrowth, a special background process, called autovacuum, removes old versions of rows — those in which xmax are the least active of all current transactions.

The transaction counter has a size of 32 bits, that is, it can store approximately four billion values. This, of course, not so much. There were proposals to make it 64-bit, but you should not forget that in this case, the amount of the database would increase noticeably at the expense of overheads - after all, xmin and xmax are stored in each row. Imagine that the limit of 2 ^ 32-1 is reached. We add one and the counter overflows and is reset to zero. This would be catastrophic, as PostgreSQL expects transaction numbers to always increase.

Of course, there is a mechanism to prevent such a situation. First of all, the transaction number space is looped around: underneath, it’s actually not a transaction that has a lower number, but one that is less than half a circle from the other. Secondly, during the cleaning (VACUUM) of the tables, so-called freezing is performed. The vacuum / autovacuum process, in addition to deleting dead lines with the old xmax, also processes live lines with the old xmin value. Lines, whose xmin is much smaller than the oldest of the running transactions and the “age” exceeds the vacuum_freeze_min_age, are “frozen” (marked with special service bits). They cease to obey the usual rules of visibility and are always considered older than any ordinary transaction. Thus, cleaning constantly freezes old lines, following in a circle behind the transaction counter.

transaction counter in postgreSQL

The age of the oldest transaction in the database is stored in the system catalog:

SELECT datname, age(datfrozenxid) FROM pg_database;

Also, statistics for each table is kept:

 SELECT relname, age(relfrozenxid) FROM pg_class;

Background processes monitor transactions automatically, but when using PostgreSQL, the administrator must ensure that the age of the oldest transaction in the database does not approach 2 ^ 31 (half a circle, half of all acceptable values). Then PostgreSQL can guarantee the accuracy of determining the age of the transaction (taking into account the cyclical nature of the counter). If the age of datfrozenxid is close to this point, then PostgreSQL will not be able to issue transaction numbers anymore and will stop working for reasons of data integrity, requiring manual intervention and cleaning (VACUUM).
That is why super-long transactions should be avoided, during which the counter has time to increase by 2 billion.

Problem and treatment

It is overflow datfrozenxid occurred at one of our clients. The administrator manually launched the VACUUM FREEZE command, which worked for 8 days. During this time, about 2 ^ 31 new transaction numbers were issued under load. It is worth noting that, although VACUUM works outside the transaction, but at startup it creates a snapshot of the data with which it distinguishes outdated versions of rows from the actual ones. The system stopped and required manual intervention, but, despite the fact that the maintenance was carried out, was no longer able to start up in operation.

The problem was that before updating the variables located in the shared memory, the VACUUM team tried to get a new transaction number to make sure that the freezing was correct and there are no “future” transactions in the system. As the available numbers ended, the command ended with an error, due to which the variables responsible for the range of available numbers were not updated. To correct the problem, two solutions were developed: operational and permanent.

An operational solution was required in order to resume the work of the customer's database as quickly as possible. To do this, we had to manually edit the variables in shared memory.

Select the number (xid) of the oldest transaction recorded in the database: SELECT datfrozenxid FROM pg_database ORDER BY age(datfrozenxid) DESC LIMIT 1;
Connect using gdb to any PostgreSQL process and execute the command: set ShmemVariableCache->oldestXid = <_xid>
Restart PostgreSQL

As a result of these actions, the values responsible for the range of available transactions were updated both in the shared memory and on the disk, and the customer's DBMS was again able to continue normal operation.

A permanent solution to the problem is that the VACUUM team does not receive a separate transaction number at all. The patch and instructions for reproducing the situation were sent to the hackers mailing list. This fix will be included in all versions of PostgreSQL .

Changes in 9.6

The unpleasant property of freezing is that for this you need to scan the entire table. Normal cleaning (vacuum) works smarter: if during the work it turns out that all versions of lines on the page are relevant (that is, xmax = 0), such a page is marked in a special file called visibility map. These vacuum pages are no longer returned until any changes occur (at which the visibility map is automatically unchecked).

Unfortunately, freezing does not use a visibility card: after all, even on pages with only current versions of rows, the not yet frozen transaction numbers in the xmin field may refuse. A periodic full scan can cause problems with a very large table size.

In PostgreSQL 9.6, the beta version of which has already been released, this difficulty has been overcome. The visibility map is now expanded to contain a “freeze map”: it will mark pages on which all transactions have already been frozen.

Transaction Counter Overflow Monitoring

To control overflow, you need to view the datfrozenxid transaction age from the pg_database system directory. If you are using the Zabbix system, try the mamonsu monitoring client, which already has the desired metric. The client is available at: mamonsu .

findings

Complex products such as relational databases are never completely free of errors. Despite the reliability of PostgreSQL, you may run into unpleasant problems when operating. The community provides support, but, firstly, you will have to do a lot of work on the design of the error message (so that the developers can reproduce your situation) and secondly, no one guarantees the period of correction.

That is why, working with business-critical systems, it is useful to have technical support from a vendor company whose developers understand the source code, can make the necessary corrections and do it in the shortest possible time.

Source: https://habr.com/ru/post/301238/

All Articles