Many conversations about backups begin with the saying that people fall into two categories ... and so I belong to those people who make backups. Properly configured backups and validation of backups reinforces sleep. And the presence of pre-written and lost instructions for recovery generally enhances digestion and immunity. So, while working with PostgreSQL, I happened to set up backups often, while the conditions and requirements were very different. However, the set of tools with rare exceptions remained unchanged. In this article I will share my experience in how to take backup copies of PostgreSQL.
If you consider backup as a very specific process, then there are two simple questions:
1. from where to run the backup?
2. What tools should I use for backup?
There are two possible answers to the first question: you can run the backup task from a dedicated backup server, in my opinion this is the most suitable option. Either run the task directly from the database server, this is in case there is no dedicated backup server.
Everything is much more interesting with tools. Here I distinguish two groups, basic tools and auxiliary ones. The main ones are those that actually perform the backup. Auxiliary are those that add something special to the backup process, such as archiving, encryption, load management, etc.
')
In the PostgreSQL bundle there are 2 utilities that allow you to make backup copies, these are
pg_dump /
pg_dumpall and
pg_basebackup . In addition, it is possible to use file copy utilities, such as rsync, tar, cp, etc.
So, what tool to run backup?
pg_dump - suitable for cases when you need to make a backup copy of the table, database, schema or data.
pg_basebackup - suitable for cases when you need to backup the entire database cluster or set up a
hot standby replica .
rsync / tar / cp - also used for copying the entire cluster.
When PostgreSQL 9.0 was just released, the backup was done using rsync, but already in 9.1 pg_basebackup appeared, which has some advantages over rsync:
- pg_basebackup does not require ssh access, but requires access to the database specified in pg_hba.conf ;
- pg_basebackup is richer in functionality (copying WAL, creating recovery.conf, built-in gzip compression, etc.);
- pg_basebackup does not require a separate function call pg_start_backup / pg_stop_backup as required when using rsync / tar / cp;
- pg_basebackup performs faster copying than rsync by using streaming replication protocol.
but there are some drawbacks:
- pg_basebackup goes out-of-the-box, and accordingly requires an installed postgres;
- pg_basebackup has no built-in functions for limiting copying speed (they promise only in 9.4);
- pg_basebackup requires the wal_level = hot_standby, max_wal_senders options in postgresql.conf.
Here I will consider pg_basebackup, although pg_dump can also be used in the methods listed below.
1. Simple and no frills backup from the backup server to the / backup directory (the directory must be previously created):
backup@backup ~ $ pg_basebackup -x -h db01.example.com -U backup -D /backup
2. Copying with low priority IO operations using ionice, for cases when it is necessary to reduce the load on disk I / O from backup:
postgres@db01 ~ $ ionice -c 3 pg_basebackup -x -h db01.example.com -U backup -D /backup
3. Copying with compression in bzip2, for cases when you need to use a non-standard compression algorithm for pg_basebackup (gzip). Here we transfer data via standard output (stdout) to standard input (stdin) to the bzip2 program.
backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |bzip2 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2
4. Copying with compression in several streams (use lbzip2 and use 6 cores). In this scenario, you can use idle kernels and speed up the compression process.
backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |lbzip2 -n 6 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2
5. Here copying is started on the database server. The generated backup is sent to the remote server via ssh.
postgres@db01 ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |ssh backup@backup.example.com "tar xf - -C /backup/"
6. Here, copying is also started on the database server and is sent to the remote server, but already with archiving in 6 streams using lbzip2.
backup@backup ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |ssh backup@backup.example.com "lbzip2 -n 6 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2"
7. Copying to a remote server with limited bandwidth up to 10Mb using pv and then archiving on the remote side. This option is for cases when you need to transfer without loading the network.
backup@backup ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |pv -r -b -L 10M |ssh backup@backup.example.com "bzip2 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2"
It is worth noting here that c 9.4 in pg_basebackup already has the ability to limit the transfer rate (-r, --max-rate).
8. Copy runs on the backup server, and then split the stream into two parts. One stream is compressed with bzip2 (the backup itself) and the second stream is copied via tar into a temporary directory for further validation. The method is rarely used, but the implementation itself is interesting here.
backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |tee >(bzip2 -9 -c > /backup/db01/backup-$(date +%d-%b-%Y).tar.bz2) |tar xf - -C /backup/validation/
9. Copying with the use of lbzip2 on both nodes, for cases when the network has a small bandwidth, the stream is first compressed, then transmitted over the network and then decompressed on the remote side. Here, tar is used and pg_start_backup ('label_name') is required on the postgres side.
postgres@master
10. Backup with encryption via GPG, for cases when you need to encrypt a backup. You should first create the keys via gpg --gen-key (in my case the keys are created with the name backup)
backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |gpg -r backup -e |bzip2 -9 > /backup/db01/backup-$(date +%d-%b-%Y).tar.bz2
To decrypt the backup, run this command.
backup@backup ~ $ bzcat /backup/backup-09-May-2014.tar.bz2 |gpg -r backup -d |tar xf - -C /example/dir/
That's all, let's summarize the tools:
- pg_basebackup is a utility for creating postgres backups;
- lbzip2 - bzip2 compression using several cores - if you need to pack faster (analogs: pbzip2 , pigz );
- ionice - class and priority adjustment for the I / O scheduler (you can also use nice to adjust the process priority for the scheduler CPU);
- pv - control the amount of data transmitted through the pipe, and so on. we use to limit the amount of data transferred per unit of time (analogue is throttle );
- tar is an archiving utility, needed for auxiliary purposes when bzip2 / gzip compression is not used;
- tee - reading from stdin with writing to stdout and other files (it is part of coreutils);
- gpg - solves the problem of encryption.
Thank you all for your attention!