So, we got acquainted with the device of the
buffer cache and, using its example, we realized that when the contents of RAM are lost during a failure, a
pre-write log is needed to restore. The size of the required log files and recovery time are limited due to a periodically executed
checkpoint .
In previous articles, we have already looked at a rather large number of important settings related to the journal in one way or another. In this article (the last in this series), we will consider those tuning issues that have not yet been discussed: log levels and their purpose, as well as the reliability and performance of logging.
Log levels
The primary purpose of the prerecord log is to provide recovery after a failure. But, if you still have to keep a journal, it can be adapted for other tasks, adding a certain amount of additional information to it. There are several levels of logging. They are set by the
wal_level parameter and are organized so that the log of each next level includes everything that goes into the log of the previous level, plus something else new.
Minimal
The minimum possible level is set by the value
wal_level = minimal and guarantees only recovery after a failure. To save space, operations related to mass data processing (such as CREATE TABLE AS SELECT or CREATE INDEX) are not logged. Instead, the necessary data is immediately written to disk, and a new object is added to the system directory and becomes visible when the transaction is committed. If a failure occurs during the operation, the data already recorded remains invisible and does not violate consistency. If the failure occurs after the operation is completed, everything necessary has already arrived on the disk and does not need to be logged.
')
We'll see. First, set the required level (for this you will also need to change another parameter -
max_wal_senders ).
=> ALTER SYSTEM SET wal_level = minimal; => ALTER SYSTEM SET max_wal_senders = 0;
student$ sudo pg_ctlcluster 11 main restart
Note that changing the level requires a server restart.
Remember the current position in the log:
=> SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/353927BC (1 row)
Now let's create the table (CREATE TABLE AS SELECT) and write the position in the log again. The amount of data selected by the SELECT statement does not matter in this case, so we will limit ourselves to one line.
=> CREATE TABLE wallevel AS SELECT 1 AS n; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/353A7DFC (1 row)
With the familiar pg_waldump utility, let's look at the log entries.
postgres$ /usr/lib/postgresql/11/bin/pg_waldump -p /var/lib/postgresql/11/main/pg_wal -s 0/353927BC -e 0/353A7DFC
Some details, of course, may differ from launch to launch, but in this case this is what happened. The Heap2 manager entry refers to cleaning, here it is an on-page cleaning of one of the tables in the system catalog (system objects are easily distinguished with the naked eye by the “short” number in rel):
rmgr: Heap2 len (rec/tot): 59/ 7587, tx: 0, lsn: 0/353927BC, prev 0/35392788, desc: CLEAN remxid 101126, blkref #0: rel 1663/16386/1247 blk 8 FPW
Then there is a record about getting the next OID for the table that we are going to create:
rmgr: XLOG len (rec/tot): 30/ 30, tx: 0, lsn: 0/35394574, prev 0/353927BC, desc: NEXTOID 82295
Now the actual creation of the table:
rmgr: Storage len (rec/tot): 42/ 42, tx: 0, lsn: 0/35394594, prev 0/35394574, desc: CREATE base/16386/74103
However, inserting data into a table is not logged. Then there are numerous entries about inserting rows into different tables and indexes - this PostgreSQL registers the created table in the system directory (I give it in abbreviated form):
rmgr: Heap len (rec/tot): 203/ 203, tx: 101127, lsn: 0/353945C0, prev 0/35394594, desc: INSERT off 71, blkref #0: rel 1663/16386/1247 blk 8 rmgr: Btree len (rec/tot): 53/ 685, tx: 101127, lsn: 0/3539468C, prev 0/353945C0, desc: INSERT_LEAF off 37, blkref #0: rel 1663/16386/2703 blk 2 FPW ... rmgr: Btree len (rec/tot): 53/ 2393, tx: 101127, lsn: 0/353A747C, prev 0/353A6788, desc: INSERT_LEAF off 10, blkref #0: rel 1664/0/1233 blk 1 FPW
And finally, transaction fixation:
rmgr: Transaction len (rec/tot): 34/ 34, tx: 101127, lsn: 0/353A7DD8, prev 0/353A747C, desc: COMMIT 2019-07-23 18:59:34.923124 MSK
Replica
When we restore the system from the backup, we start from some state of the file system and gradually bring the data to the recovery point, playing back the archived journal entries. The number of such records can be very large (for example, several days), that is, the recovery period will cover not one control point, but many. Therefore, it is clear that the minimum level of the log is not enough - if some operation is not logged, we simply will not know that it needs to be repeated. To restore from a backup,
all operations must be logged.
The same is true for replication - everything that is not logged will not be transferred to the replica and will not be reproduced. But, if we want to execute requests on a replica, it is still complicated.
First, we need information about exclusive locks that occur on the primary server, as they may conflict with requests on the replica. Such locks are logged and applied on the replica (on behalf of the startup process).
Secondly, you need to be able to build
data snapshots , and for this, as we recall, information about ongoing transactions is needed. In the case of a replica, we are talking not only about local transactions, but also about transactions on the main server. The only way to transmit this information is to periodically write it to the log (this happens every 15 seconds).
The log level, which guarantees both the ability to restore from a backup and the possibility of physical replication, is set by the value wal_level =
replica . (Prior to version 9.6, there were two separate levels archive and hot_standby, but then they were combined into one common one.)
Starting with PostgreSQL 10, this is the default level (and before that it was minimal). Therefore, just reset the parameters to default values:
=> ALTER SYSTEM RESET wal_level; => ALTER SYSTEM RESET max_wal_senders;
student$ sudo pg_ctlcluster 11 main restart
We delete the table and repeat exactly the same sequence of actions as last time:
=> DROP TABLE wallevel; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/353AF21C (1 row)
=> CREATE TABLE wallevel AS SELECT 1 AS n; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/353BE51C (1 row)
Now check the journal entries.
postgres$ /usr/lib/postgresql/11/bin/pg_waldump -p /var/lib/postgresql/11/main/pg_wal -s 0/353AF21C -e 0/353BE51C
Cleaning, obtaining OID, creating a table and registering in the system directory - for now, everything is as it was:
rmgr: Heap2 len (rec/tot): 58/ 58, tx: 0, lsn: 0/353AF21C, prev 0/353AF044, desc: CLEAN remxid 101128, blkref #0: rel 1663/16386/1247 blk 8 rmgr: XLOG len (rec/tot): 30/ 30, tx: 0, lsn: 0/353AF258, prev 0/353AF21C, desc: NEXTOID 82298 rmgr: Storage len (rec/tot): 42/ 42, tx: 0, lsn: 0/353AF278, prev 0/353AF258, desc: CREATE base/16386/74106 rmgr: Heap len (rec/tot): 203/ 203, tx: 101129, lsn: 0/353AF2A4, prev 0/353AF278, desc: INSERT off 73, blkref #0: rel 1663/16386/1247 blk 8 rmgr: Btree len (rec/tot): 53/ 717, tx: 101129, lsn: 0/353AF370, prev 0/353AF2A4, … rmgr: Btree len (rec/tot): 53/ 2413, tx: 101129, lsn: 0/353BD954, prev 0/353BCC44, desc: INSERT_LEAF off 10, blkref #0: rel 1664/0/1233 blk 1 FPW
But something new. The record of the exclusive lock related to the Standby manager - in this case, it is blocking the transaction number (why is it needed, we will talk in detail in the next series of articles):
rmgr: Standby len (rec/tot): 42/ 42, tx: 101129, lsn: 0/353BE2D8, prev 0/353BD954, desc: LOCK xid 101129 db 16386 rel 74106
And this is a record about inserting rows into our table (compare the file number rel with the one indicated above in the CREATE record):
rmgr: Heap len (rec/tot): 59/ 59, tx: 101129, lsn: 0/353BE304, prev 0/353BE2D8, desc: INSERT+INIT off 1, blkref #0: rel 1663/16386/74106 blk 0
Commit record:
rmgr: Transaction len (rec/tot): 421/ 421, tx: 101129, lsn: 0/353BE340, prev 0/353BE304, desc: COMMIT 2019-07-23 18:59:37.870333 MSK; inval msgs: catcache 74 catcache 73 catcache 74 catcache 73 catcache 50 catcache 49 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 74106 snapshot 1214
And another record, which occurs periodically and is not tied to the completed transaction, refers to the Standby manager and reports on the transactions currently ongoing:
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/353BE4E8, prev 0/353BE340, desc: RUNNING_XACTS nextXid 101130 latestCompletedXid 101129 oldestRunningXid 101130
Logical
Finally, the last level is set by the value of the parameter
wal_level = logical and provides the possibility of logical decoding and logical replication. It must be enabled on the publishing server.
From the point of view of journal entries, this level is practically no different from replica - entries related to replication origins and arbitrary logical entries that can be added to the application log are added. Basically, logical decoding depends on information about ongoing transactions, since you need to build a snapshot of data to track changes in the system catalog.
Now we will not go into details of the operation of backup and replication - this is a big topic for a separate series of articles.
Record reliability
It is clear that the journaling mechanism must be reliable and guarantee the possibility of recovery in any situation (not related, of course, to damage to the data carrier). Reliability is affected by many factors, of which we will consider caching, data corruption, and atomicity of recordings.
Caching
There are numerous caches on the data path to non-volatile storage (such as a hard disk drive).
When a program (any, but in our case PostgreSQL) asks the operating system to write something to disk, the operating system transfers the data to its cache in RAM. Actual recording occurs asynchronously, depending on the settings of the I / O scheduler of the operating system.
When the OS decides to write data, they fall into the cache of the drive (hard disk). Drive electronics can also delay recording, for example, collecting data in groups that are more profitable to record at the same time. And if a RAID controller is used, another level of caching appears between the OS and the drive.
Thus, if you do not take special measures, it is completely unclear when the data will really be safely stored. This is usually not important, but there are critical places where PostgreSQL needs to be sure that the data is written securely. First of all, this is journaling (if the journal entry did not reach the disk, it will disappear along with the rest of the contents of the RAM) and a checkpoint (it must be sure that the dirty pages are actually written to disk). But there are other situations, say, the execution of non-journalized operations at the minimum level, etc.
The operating system provides tools that must guarantee the immediate writing of data to non-volatile memory. There are several options, but they come down to two main ones: either a synchronization command (fsync, fdatasync) is given after recording, or when opening a file (or writing to it), a special flag is indicated for synchronization or even direct recording, bypassing the OS cache.
As for the log, the pg_test_fsync utility allows you to choose the method that is most suitable for a particular OS and a specific file system, and it is installed in the configuration parameter
wal_sync_method . Regular files are always synchronized using fsync.
The subtle point is that when choosing a method, the characteristics of the equipment must be taken into account. For example, if you use a controller supported by a backup battery, there is no reason not to use its cache, since the battery will allow you to save data in the event of a power failure.
The documentation contains many details on this subject.
In any case, synchronization is expensive and takes place no more often than absolutely necessary (we will return to this issue a little lower when we talk about performance).
Generally speaking, synchronization can be turned off (the
fsync parameter is responsible for this), but in this case you should forget about storage reliability. By disabling
fsync , you agree that data may be irretrievably lost at any time. Probably the only reasonable option to use this option is to temporarily increase productivity, when data can be easily restored from another source (for example, during the initial migration).
Data corruption
The equipment is imperfect and data may be damaged on the media, when transmitting data via interface cables, etc. Some of these errors are processed at the hardware level, but some are not.
In order to detect the problem in time, the journal entries are always provided with checksums.
Data pages can also be protected with checksums. For now, this can only be done when the cluster is initialized, but in PostgreSQL 12 it will be possible to turn them on and off using the pg_checksums utility (though not yet on the fly, but only with the server stopped).
In a production environment, checksums must be included, despite the overhead of their calculation and control. This reduces the likelihood that a failure will not be detected in time.
Reduces, but does not eliminate.
Firstly, checksums are only checked when accessing the page - therefore, damage can go unnoticed until it gets to all backups. That is why pg_probackup checks for checksums of all pages of the cluster during backup.
Secondly, a page filled with zeros is considered correct - if the file system mistakenly "nullifies" the file, this may go unnoticed.
Thirdly, checksums protect only the main layer of data files. The remaining layers and other files (for example, XACT transaction statuses) are not protected by anything.
Alas.
Let's see how it works. First, make sure that checksums are enabled (note that this is not the case when installing a package on Debian-like systems by default):
=> SHOW data_checksums;
data_checksums ---------------- on (1 row)
The
data_checksums parameter is read-only.
Here is the file in which our table is located:
=> SELECT pg_relation_filepath('wallevel');
pg_relation_filepath ---------------------- base/16386/24890 (1 row)
Stop the server and change a few bytes in the zero page, for example, delete the last journal entry from the LSN header.
student$ sudo pg_ctlcluster 11 main stop
postgres$ dd if=/dev/zero of=/var/lib/postgresql/11/main/base/16386/24890 oflag=dsync conv=notrunc bs=1 count=8
8+0 records in 8+0 records out 8 bytes copied, 0,0083022 s, 1,0 kB/s
In principle, the server could not be stopped. It is enough that the page was written to disk and was forced out of the cache (otherwise the server will work with the page from the cache). But such a scenario is more difficult to reproduce.
Now we start the server and try to read the table.
student$ sudo pg_ctlcluster 11 main start
=> SELECT * FROM wallevel;
WARNING: page verification failed, calculated checksum 23222 but expected 50884 ERROR: invalid page in block 0 of relation base/16386/24890
But what if the data cannot be restored from the backup? The
ignore_checksum_failure parameter allows
you to try to read the table, naturally with the risk of getting distorted data.
=> SET ignore_checksum_failure = on; => SELECT * FROM wallevel;
WARNING: page verification failed, calculated checksum 23222 but expected 50884 n --- 1 (1 row)
Of course, in this case everything is successful, because we messed up only the page title, and not the data itself.
And one moment. When checksums are turned on, bits of prompts are written to the log (we
examined them earlier), since a change in any, even non-essential, bit also leads to a change in the checksum. With the checksums turned off, the
wal_log_hints parameter is responsible for writing hint bits to the
log .
Changes to tooltip bits are always logged as a
full page image (FPI, full page image), which increases the size of the log in order. In this case, it makes sense to enable compression of full images using the
wal_compression parameter (this parameter appeared in version 9.5). Below we look at specific numbers.
Atomicity record
And finally, there is the problem of atomicity of the record. The database page takes at least 8 KB (it can be 16 or 32 KB), and at a low level, recording takes place in blocks that are usually smaller (usually 512 bytes or 4 KB). Therefore, in the event of a power failure, the data page may be partially recorded. It is clear that during recovery it makes no sense to apply ordinary journal entries to such a page.
For protection, PostgreSQL allows you to write to the log the
full image of the page when it is first changed after the start of the control point (the same image is recorded when the tooltip bits change). The
full_page_writes parameter
controls this , and it is enabled by default.
If a page image is encountered during restoration in a log, it is unconditionally (without LSN checking) written to disk: there is more trust in it, because, like any log record, it is protected by a checksum. And already regular journal entries are applied to this guaranteed correct image.
Although PostgreSQL excludes unallocated space from the full page image (we previously
examined the structure of the block), the volume of generated journal entries increases significantly. As already mentioned, the situation can be improved by compressing the full images (parameter
wal_compression ).
In order to somehow feel the change in the size of the log, we will conduct a simple experiment using the pgbench utility. Let's initialize:
student$ pgbench -i test
dropping old tables... creating tables... generating data... 100000 of 100000 tuples (100%) done (elapsed 0.15 s, remaining 0.00 s) vacuuming... creating primary keys... done.
The
full_page_writes parameter
is enabled:
=> SHOW full_page_writes;
full_page_writes ------------------ on (1 row)
Run the breakpoint and immediately run the test for 30 seconds.
=> CHECKPOINT; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/38E04A08 (1 row)
student$ pgbench -T 30 test
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 26851 latency average = 1.117 ms tps = 895.006720 (including connections establishing) tps = 895.095229 (excluding connections establishing)
=> SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/3A69C478 (1 row)
Log Record Size:
=> SELECT pg_size_pretty('0/3A69C478'::pg_lsn - '0/38E04A08'::pg_lsn);
pg_size_pretty ---------------- 25 MB (1 row)
Now turn off the full_page_writes parameter:
=> ALTER SYSTEM SET full_page_writes = off; => SELECT pg_reload_conf();
And repeat the experiment.
=> CHECKPOINT; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/3A69C530 (1 row)
student$ pgbench -T 30 test
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 27234 latency average = 1.102 ms tps = 907.783080 (including connections establishing) tps = 907.895326 (excluding connections establishing)
=> SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/3BE87658 (1 row)
Log Record Size:
=> SELECT pg_size_pretty('0/3BE87658'::pg_lsn - '0/3A69C530'::pg_lsn);
pg_size_pretty ---------------- 24 MB (1 row)
Yes, the size has decreased, but not at all as significant as one might expect.
The reason is that the cluster is initialized with checksums in the data pages, and therefore you still have to write full page images to the log when changing the tooltip bits. These data (in our case) make up about half of the total volume, which can be seen by looking at the statistics:
postgres$ /usr/lib/postgresql/11/bin/pg_waldump --stats -p /var/lib/postgresql/11/main/pg_wal -s 0/3A69C530 -e 0/3BE87658
Type N (%) Record size (%) FPI size (%) ---- - --- ----------- --- -------- --- XLOG 1721 ( 1,03) 84329 ( 0,77) 13916104 (100,00) Transaction 27235 ( 16,32) 926070 ( 8,46) 0 ( 0,00) Storage 1 ( 0,00) 42 ( 0,00) 0 ( 0,00) CLOG 1 ( 0,00) 30 ( 0,00) 0 ( 0,00) Standby 4 ( 0,00) 240 ( 0,00) 0 ( 0,00) Heap2 27522 ( 16,49) 1726352 ( 15,76) 0 ( 0,00) Heap 109691 ( 65,71) 8169121 ( 74,59) 0 ( 0,00) Btree 756 ( 0,45) 45380 ( 0,41) 0 ( 0,00) -------- -------- -------- Total 166931 10951564 [44,04%] 13916104 [55,96%]
For compactness, I removed the zero rows from the table. Pay attention to the total line (Total) and compare the size of the full images (FPI size) with the size of ordinary records (Record size).
The
full_page_writes parameter can only be disabled if the file system and hardware used by themselves guarantee atomic recording. But, as we can see, there is no great reason for this (assuming that checksums are included).
Now let's see how compression helps.
=> ALTER SYSTEM SET full_page_writes = on; => ALTER SYSTEM SET wal_compression = on; => SELECT pg_reload_conf();
Repeat the same experiment.
=> CHECKPOINT; => SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/3BE87710 (1 row)
student$ pgbench -T 30 test
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 26833 latency average = 1.118 ms tps = 894.405027 (including connections establishing) tps = 894.516845 (excluding connections establishing)
=> SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn --------------------------- 0/3CBD3EA8 (1 row)
Log Record Size:
=> SELECT pg_size_pretty('0/3CBD3EA8'::pg_lsn - '0/3BE87710'::pg_lsn);
pg_size_pretty ---------------- 13 MB (1 row)
Conclusion: in the presence of a large number of full page images (due to checksums or
full_page_writes , that is, almost always), it most likely makes sense to use compression despite the fact that this loads the processor.
Performance
During normal server operation, a continuous sequential recording of log files occurs. Since there is no random access, regular HDDs also cope with this task. But this type of load is significantly different from how data files are accessed.
Therefore, it is usually advantageous to place the log on a separate physical disk (or disk array) mounted on the server file system. Instead of the $ PGDATA / pg_wal directory, you need to create a symbolic link to the corresponding directory.
There are a couple of situations in which log files need to not only be written, but also read. The first is the understandable case of recovery after a failure. The second is less trivial. It occurs if streaming replication is used, and the replica does not manage to receive journal entries while they are still in the main server RAM buffers. Then the walsender process has to read the necessary data from the disk. We’ll talk about this in more detail when we get to replication.
Logging takes place in one of two modes:
- synchronous - when a transaction is committed, the continuation of work is impossible until all journal entries about this transaction are on the disk;
- asynchronous - the transaction completes immediately, and the log is written in the background.
Synchronous mode is determined by the
synchronous_commit parameter and is enabled by default.
Since synchronization is associated with real (i.e., slow) I / O, it is beneficial to do it as little as possible. To do this, the servicing process that completes the transaction and writes a log takes a short pause, determined by the
commit_delay parameter. But this only happens if the system has at least
commit_siblings of active transactions.
The bet here is on the fact that during the waiting time some transactions will have time to complete and it will be possible to synchronize their records in one go. This is similar to how you hold the elevator door so that someone has time to drop into the cab.The default parameter is commit_siblings = 5, and commit_delay = 0, so no wait actually occurs. Change commit_delay is only useful in systems that perform a large number of short OLTP-transactions.Then the process flushes the log to disk to the required LSN (or slightly more if new entries were added during the wait). After that, the transaction is considered completed.With synchronous recording, durability is guaranteed (letter D in the acronym ACID) - if the transaction is committed, then all its journal entries are already on disk and will not be lost. The flip side is that synchronous recording increases the response time (the COMMIT command does not return control until synchronization ends) and reduces system performance.An asynchronous write can be obtained by setting synchronous_commit = off (or local).During asynchronous recording, the log writer process resets the journal entries, alternating the wait cycles (which is set by the wal_writer_delay = 200ms parameter by default).Waking up after the next wait, the process checks to see if the full WAL pages appeared from the last time. If there are, then the process ignores the current, unfilled page, and records only fully filled ones. (True, it’s not always all at once: the recording stops, reaching the end of the cache, and continues from the beginning of the cache the next time.)If not a single page is full, the process writes the current (incomplete) page of the journal - for good reason woke up?This algorithm aims to not synchronize the same page several times if possible, which is important for a large stream of changes.Asynchronous recording is more efficient than synchronous recording - committing changes does not wait for recording. However, reliability decreases: the recorded data may disappear in the event of a failure if less than 3 × wal_writer_delay time elapsed between the commit and the failure (which, by default, is a little more than half a second).A difficult choice - efficiency or reliability - remains with the system administrator.Please note: unlike disabling synchronization ( fsync = off), asynchronous mode does not lead to the inability to restore. In the event of a failure, the system will still restore a consistent state, but perhaps some of the latest transactions will be absent from it. Synchronous_commitparametercan be set as part of individual transactions. This allows you to increase productivity by sacrificing the reliability of only part of the transaction. Say, financial transactions always need to be fixed synchronously, and chat messages can sometimes be neglected.In reality, both of these modes work together. Even with synchronous commit, long transaction logs will be written asynchronously to free up WAL buffers. And if, when resetting a page from the buffer cache, it turns out that the corresponding journal entry is not yet on disk, it will be immediately reset in synchronous mode.To get some idea of ​​what asynchronous commit gives, we try to repeat the pgbench test in this mode. => ALTER SYSTEM SET synchronous_commit = off; => SELECT pg_reload_conf();
student$ pgbench -T 30 test
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 45439 latency average = 0.660 ms tps = 1514.561710 (including connections establishing) tps = 1514.710558 (excluding connections establishing)
With synchronous commit, we received approximately 900 transactions per second (tps), with asynchronous commit - 1500. Of course, in a real system under real load the ratio will be different, but it is clear that the effect can be very significant with short transactions.At this point, the series of articles about journaling came to an end. If something important is left behind the scenes, do not find it difficult to write in the comments. Thanks to all!
And then we will have exciting adventures in the world of locks, but that's another story.