PostgreSQL slave + btrfs and systemd = hot test database

With the active development of software often need a test database with relevant data from the combat base. Well, if the base is small and deploy a copy is not long. But if there are dozens of gigabytes of data in the database and everything is needed for complete testing, and even fresher, then difficulties arise. In this article I will describe the option to overcome such troubles with the help of btrfs snapshots. And systemd, a convenient and functional tool, will manage the work of the resulting complex.

Training

On the servers I used:

Debian jessie 4.7.0-0.bpo.1-amd64 #1 SMP Debian 4.7.8-1~bpo8+1 btrfs-progs v4.7.3 systemd 230 postgresql 9.6.1

But it does not matter.
In the btrfs section, which will be given under the base, create two volumes:

 btrfs subvolume create /var/lib/postgresql/slave btrfs subvolume create /var/lib/postgresql/snapshot

The first will store the database data, and the second will contain the shapshot data.

First of all, you need to raise the PostgreSQL slave database, which normally will contain a full copy of the wizard data. The deployment algorithms are not once described, therefore, briefly and essentially:

Master postgresql.conf

 max_wal_senders = 3 # max number of walsender processes # (change requires restart) wal_keep_segments = 64 # in logfile segments, 16MB each; 0 disables archive_mode = on # allows archiving to be done # (change requires restart) archive_command = 'test ! -f /var/lib/postgresql/9.6/main/archive/%f.7z && 7za a /var/lib/postgresql/9.6/main/archive/%f.7z -p______ -mx7 -mhe=on %p'

Thus, by filling in the next WAL segment, the process of archiving it into a special archive directory in the base folder is launched. In my case, further these archive logs are transmitted by open channels and for some time are stored in different places, therefore they are encrypted and protected with a password. The term that we will keep the archive logs will determine the time period by which we can restore the state of the base-slave. With active database changes, the number of WAL logs can grow very quickly, devouring free space, and then archiving is very useful.

You need to clean the archive logs yourself, for example:

 find /var/lib/postgresql/9.6/main/archive/ -type f -mtime +10 -print0 | xargs -0 -n 13 /bin/rm

Slave postgresql.conf

 data_directory = '/var/lib/postgresql/slave/9.6/main' # use data in another directory hot_standby = on # "on" allows queries during recovery

The base directory is placed one level below the standard in the " slave " directory, since / var / lib / postgresql is the mount point of the btrfs partition.

Slave recovery.conf

 standby_mode = 'on' primary_conninfo = 'host=master.host port=5432 user=replicator password=' trigger_file = '/var/lib/postgresql/slave/9.6/main/trigger' restore_command = '7za e /mnt/backup/postgresql/archive/%f.7z -p______ -so > %p'

With this setting, the database is launched in slave mode and receives the changed data directly from the wizard. And if suddenly there will be no data on the master, then it will raise the required data segment from the / mnt / backup / postgresql / archive / directory. Archive logs get there using rsync, which periodically takes updates from the wizard. I take it in synchronization mode with deletion, but you can store them for an arbitrary period of time.

The database also monitors the emergence of the trigger file and if it appears, it switches to the master mode and is ready to process all requests itself. From this point on, master and slave bases are out of sync. A brilliant idea is to take a snapshot of the slave base before switching over and, after the necessary experiments, put it back in place, as if nothing had happened. Having risen, the base will find that it has lagged behind the master, using the archive-logs, will catch up and enter the regular slave-mode. This operation takes significantly less time than a full recovery from backup.

We implement the idea using systemd, which will not only automate the process, but also take into account the necessary dependencies and unwanted conflicts. I assume that you are already familiar with how systemd works and is configured, but I will try to describe in detail the parameters used.

Systemd setup

First of all, we will mount the partition where the database data is stored. I have everything assembled in the LXC container, in which the partition with btrfs gets a block device on the path /dev/pg-slave-test-db . Create a systemd (unit) element of the .mount type:

/etc/systemd/system/var-lib-postgresql.mount

 [Unit] Description=PostgreSQL DB dir on BTRFS Conflicts=umount.target [Mount] What=/dev/pg-slave-test-db Where=/var/lib/postgresql Type=btrfs Options=defaults TimeoutSec=30 [Install] WantedBy=multi-user.target

The name of the element is determined by the mount point. Conflicts=umount.target will provide unmounted partition on shutdown. There is a moment: you can specify not only absolute paths, but also use a unique UUID device identifier. But, when running in the LXC container, I stumbled upon a strange one - the UUID in the system is not visible until it is explicitly requested by the blkid command. Therefore I use the absolute path.

This unit can be launched both independently and used in dependencies, which we will do.

Create a systemd element describing the launch of PostgreSQL in slave mode. The base is pre-configured to start in manual mode, in debian this is done in the /etc/postgresql/9.6/main/start.conf file. Also, if you have the postgresql@9.6-main.service service, you need to turn it off.

/etc/systemd/system/postgres-slave.service

 [Unit] Description=Restore PostgreSQL base snapshot and Switch the PostgreSQL base in slave mode. After=network.target var-lib-postgresql.mount postgresql.service Requires=var-lib-postgresql.mount Conflicts=postgres-master.service [Service] Type=oneshot RemainAfterExit=yes User=root TimeoutSec=300 ExecStartPre=-/usr/bin/pg_ctlcluster -m fast 9.6 main stop ExecStartPre=/bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /bin/btrfs subvolume delete /var/lib/postgresql/slave; fi' ExecStartPre=/bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /bin/mv -v /var/lib/postgresql/snapshot/auto /var/lib/postgresql/slave; fi' ExecStart=/usr/bin/pg_ctlcluster 9.6 main start ExecStop=/usr/bin/pg_ctlcluster -m fast 9.6 main stop [Install] WantedBy=multi-user.target

We will analyze in detail.

The After parameter sets the desired sequence for starting the service, but does not oblige to anything, unlike the Requires . The latter requires that the specified service be active and will try to activate it. If this fails, the entire service will transition to the " fail " state. The Conflicts parameter says that our service cannot coexist with the specified one and one of them needs to be turned off. In this case, when you start the service " postgres-slave.service " will be automatically turned off " "postgres-master.service "(which we describe below), which is very convenient.

Type=oneshot says that the service will work quickly and you need to wait until the end. And RemainAfterExit=yes will leave the service in the " active " state after successfully completing the required actions. Since we will create snapshots and manage the database, it is advisable to increase the timeout, which by default is 30 seconds.

Further, in essence, a script is executed that brings the system to the desired state. The ExecStartPre commands ExecStartPre executed before the main ExecStart command in order and each waits for the successful completion of the previous one. But the first command has a modifier " - ", which allows you to continue execution, despite the return code. After all, the base may already be stopped at the time of launch.

startup script:

 /usr/bin/pg_ctlcluster -m fast 9.6 main stop /bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /bin/btrfs subvolume delete /var/lib/postgresql/slave; fi' /bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /bin/mv -v /var/lib/postgresql/snapshot/auto /var/lib/postgresql/slave; fi' /usr/bin/pg_ctlcluster 9.6 main start

Absolute paths to programs are required. Where it is necessary to check the current state of the system during execution, bash is called and a small script is executed. So:

In the first step, turn off the working database with the " -m fast " parameter, so as not to wait for all clients to disconnect.
Check whether there is a directory " /var/lib/postgresql/snapshot/auto ", which stores the last snapshot of the database in the slave mode, and which appears as a result of the service " postgres-master.service ". If there is a snapshot, then the current state of the database is master. Delete the test database data ( /var/lib/postgresql/slave ).
And restore the picture to this place.
After successful execution of preparatory commands, we launch the base.

The last command " ExecStop " determines how the service will shut down. The service is self-sufficient and can be added to autorun at system startup.

Create a systemd element describing the launch of PostgreSQL in master mode.

/etc/systemd/system/postgres-master.service

 [Unit] Description=Create PostgreSQL base Snapshot and Switch the PostgreSQL base in master mode. After=network.target var-lib-postgresql.mount postgresql.service Requires=var-lib-postgresql.mount Conflicts=postgres-slave.service [Service] Type=oneshot RemainAfterExit=yes User=root TimeoutSec=300 ExecStartPre=-/usr/bin/pg_ctlcluster -m fast 9.6 main stop ExecStartPre=/bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /sbin/btrfs subvolume delete /var/lib/postgresql/snapshot/auto; fi' ExecStartPre=/bin/btrfs subvolume snapshot /var/lib/postgresql/slave /var/lib/postgresql/snapshot/auto ExecStartPre=/usr/bin/touch /var/lib/postgresql/slave/9.6/main/trigger ExecStart=/usr/bin/pg_ctlcluster 9.6 main start ExecStop=/usr/bin/pg_ctlcluster -m fast 9.6 main stop [Install] WantedBy=multi-user.target

In conflicts with the service, of course " postgres-slave.service ".

The executable script is as follows:

 /usr/bin/pg_ctlcluster -m fast 9.6 main stop /bin/bash -c 'if [[ -d "/var/lib/postgresql/snapshot/auto" ]]; then /sbin/btrfs subvolume delete /var/lib/postgresql/snapshot/auto; fi' /bin/btrfs subvolume snapshot /var/lib/postgresql/slave /var/lib/postgresql/snapshot/auto /usr/bin/touch /var/lib/postgresql/slave/9.6/main/trigger /usr/bin/pg_ctlcluster 9.6 main start

We stop working base.
Check if there is an old snapshot of the database along the path " /var/lib/postgresql/snapshot/auto ", if there is, delete it.
Create a new snapshot of the latest database data.
Create a trigger file, which will transfer the database to master mode.
Raise the base.

From this point on, you can perform the most daring experiments on the data. And when you get tired, launching the " postgres-slave.service " service will put everything back.

Switching the base to different modes can be configured by cron, by any criteria, through dependent services, or even get a systemd element of the " .socket " type and raise the test base on the first request to the application. Fortunately, systemd allows.

To monitor the current lag from the master database, you can use the query:

 postgres=# select now() - pg_last_xact_replay_timestamp() AS replication_delay;

Which is not difficult to hang on the same zabbix.

UPD pg_rewind

The comments voiced numerous concerns about the stability of btrfs as such and doubts about the suitability of using it for storing the database in particular. Therefore, I suggest adapting the above concept to the capabilities of the pg_rewind utility. This utility finds the master and slave out of sync point and brings the target base to the last state before it is out of sync. But this is only a condition, in order to recover the data, all the accumulated WAL logs will be needed. Also pg_rewind downloads quite a large amount of data from the master, even if the databases have just dispersed (in my case, ~ 10G). And the target database should correctly shut down before the synchronization operation.

So:

In the test database configuration, you must enable the wal_log_hints = on option, which is necessary for pg_rewind , and may also affect performance. On master, it is necessary to allow access from a test server to a user with super user rights.
From systemd scripts, we remove all occurrences of btrfs and work with snapshot. " postgres-master.service " will become quite simple and will be reduced to creating a trigger file.
The script " postgres-slave.service " will be approximately like this:

/etc/systemd/system/postgres-slave.service

 [Unit] Description=Restore PostgreSQL base snapshot and Switch the PostgreSQL base in slave mode. After=network.target var-lib-postgresql.mount postgresql.service Requires=var-lib-postgresql.mount Conflicts=postgres-master.service [Service] Type=oneshot RemainAfterExit=yes User=root #     . TimeoutSec=300 ExecStartPre=-/usr/bin/pg_ctlcluster -m fast 9.6 main stop ExecStartPre=/bin/bash -c 'if [[ -e "/var/lib/postgresql/slave/9.6/main/recovery.done" ]]; then /usr/lib/postgresql/9.6/bin/pg_rewind -D /var/lib/postgresql/slave/9.6/main/ --source-server="host=master.host port=5432 user=postgres password=___" --progress ; /bin/cp /var/lib/postgresql/recovery.conf /var/lib/postgresql/slave/9.6/main/recovery.conf; fi' ExecStart=/usr/bin/pg_ctlcluster 9.6 main start ExecStop=/usr/bin/pg_ctlcluster -m fast 9.6 main stop [Install] WantedBy=multi-user.target

Before you start pg_rewind you must correctly shut down the database.
If the base is in master state, then we start its synchronization with the main base:

We run pg_rewind with an indication of the target directory of the base ( -D ), the connection parameters to the master and the key for the detailed output.
Copy the previously saved recovery.conf file, which is deleted as a result of pg_rewind .
Raise the base, which will begin to catch the runaway master using WAL-logs.

In this embodiment, the recovery of the slave database takes significantly longer, but there are no dangers of using btrfs.

Thanks for attention.

Source: https://habr.com/ru/post/317660/

All Articles

PostgreSQL slave + btrfs and systemd = hot test database

Training

Systemd setup

UPD pg_rewind

More articles: