📜 ⬆️ ⬇️

Failure of the master in the PostgreSQL-cluster: how to be?

Greetings. Today I would like to talk about such an unpleasant situation as the master's failure in the case of using native replication in PostgreSQL 9.x. So, suppose you have a cluster of two or more PostgreSQL servers and a meteorite suddenly fell on the master. It is logical to assume that you have to make one of the replicas master. This can be done in two ways.

1. Application trigger file.


The manual for setting up replication says that in recovery.conf, among other things, you can (and should) specify the parameter trigger_file. Everything is simple here - as soon as you create the file specified in this parameter on the replica, PostgreSQL will interrupt the recovery process (in our case, replication) and open a new timeline.
This means that after creating the trigger file, the position of the binary log counter will change, and not sequentially (say, from 000000010000000000000043 to 000000010000000000000044), but with an indication of a new era (by 0000000 2 00000000000043).

The good news is that this method does not require a restart - everything will happen on the fly. There will be no downtime (we do not take into account the time for changing configs on clients), all connections will be preserved - PostgreSQL will simply nail down the walreceiver process and give the record a go-ahead.
')
The bad news is that this method is good if you have only two servers - the master and one replica - because if the cluster consists of 3 or more machines - it will not be possible to make this node a new master without remapping other replicas - when trying to tie another replica to a new master, PostgreSQL invariably says the following:

FATAL: timeline 2 of the primary does not match recovery target timeline 1

Any attempts to slip the replicas of the history file (which stores the transition point to the new timeline - this file is created every time after the recovery process is completed) also failed. In general, the official MailList participants adhere to the same point of view - with this approach, other replicas will have to be modified (in the case of 9.0 - using pg_start_backup / pg_stop_backup and rsync, and in the case of 9.1 - using the pg_basebackup utility).

2. Uninstall recovery.conf


For some reason, the description of the second method in the manuals could not be found (perhaps it is there and I was not attentive enough - I will not argue). I think you will agree that it is simple, logical and, apparently, reliable (at least - I did not manage to finally break something in the process of repeated experiments):

1. From the entire cluster, you need to find the most recent replica. This can be done from the console by doing something like: on each host:

# ps aux|grep postg|grep rec
postgres 143 0.0 13.2 8692004 6533448 ? Ss Feb06 3:58 postgres: startup process recovering 00000001000001E500000054
postgres 2683 0.0 0.0 8699452 4044 ? Ss Feb09 0:33 postgres: wal receiver process streaming 1E5/542F9970


If the meteorite has not yet fallen on your master, but you expect it from minute to minute - it is better to put out to the master node in advance so that the positions of the binary log will not change during your manipulations.

2. On the selected replica, we change postgresql.conf so that the node can be a master (the parameters listed below are taken from the replication setup manual, in your case, the values, of course, may differ):
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'


3. Rules pg_hba.conf:
host replication postgres 192.168.100.3/32 trust

4. Delete on new wizard recovery.conf
5. Restart PostgreSQL on the new wizard.
6. Rule recovery.conf on the other replicas (indicate the new wizard) and perform restarts.

This simple way you can turn a replica into a master without losing the position of a binary log. Of the obvious drawbacks, the entire cluster will have to be restarted (although if you have the opportunity to move the IP address from the old master to the new one, you will not need to restart the replicas).

If for some reason you want to make a replica of a new master, the position of the bin-log, which is not the newest, you will first have to assign the newest replica of the master (it is desirable to close it from random connections for recording), synchronize it with the replica from which you want to do the wizard in the end and only after that do all the operations described above on it (that is, in fact, you will assign the wizard twice - first only for one replica, and then for all the others).

In general, in the process of experiments, the only problem place that was managed to grope is a situation where some of the replicas have lagged behind so much that the new master does not have the XLOG files necessary for her. I told you how to deal with this in my previous post - I can only add that if you send binary logs to a backup server during archiving, this problem can hardly be called significant.

Source: https://habr.com/ru/post/137932/


All Articles