Backup Server Organization Linux, ZFS and rsync

TL; DR:
An article about setting up backup Linux servers. The ZFS partition with deduplication and compression is used as storage. Daily snapshots are made, which are stored for a week (7 pieces). Monthly snapshots are stored for a year (12 more). Rsync serves as a transport: on the server it is started by a daemon, on clients it is started from crontab.

It so happened that I have a couple of servers on which virtual machines live under KVM. I wanted to backup the images of these machines to the network, but in such a way that the following conditions were met:

Store all backups for the last week.
Store monthly backups for a year.
No third-party backup agents. On clients only standard and proven software generations admins.
Economical use of storage space. Compression and data deduplication is desirable.
All files must be accessible without additional tools and shells. Ideal: each backup in a separate directory.

Is it possible to combine all this? Yes, and very simple.
')
All the computers in question in this article are servers. But somehow it is stupid and long to divide them into “a server that stores backups” and “a server whose backups are stored by a server that stores backups”. Therefore, I will call the first simply a server, and the second has already begun to be called a client.

1. ZFS with compression and deduplication

The most familiar OS for me is Linux. All the same, without much change, should come to both Solaris and FreeBSD, in which ZFS has been around for a long time and what is called out of the box. But Linux is closer and dearer to me, and the project on porting ZFS to it is already quite mature. For a year of experiments, I had no noticeable problems with him. Therefore, I installed Debian Wheezy on the server, connected the official project repository and installed the necessary packages .

I created a pool, specifying that I will have zfs on / dev / md1 and that I want to mount this file system to the / mnt / backup directory:

# zpool create backup -m /mnt/backup /dev/md1

By the name of the device / dev / md1, you can see that I use the Linux software raid. Yes, I know that ZFS has its own way to create mirrors. But since this machine already has one mirror (for the root partition) and it is made by the regular mdadm, then for the second mirror I would prefer to use it as well.

Turned on deduplication and compression, made visible directory with snapshots:

 # zfs set dedup=on backup # zfs set compression=on backup # zfs set snapdir=visible backup

I put a snapshot script in / usr / local / bin:

 #!/bin/bash export LANG=C ZPOOL='backup' #    7  #         NOWDATE=`date +20%g-%m-%d` #   -- OLDDAY=`date -d -7days +%e` if [ $OLDDAY -eq '4' ] then OLDDATE=`date -d -1year-7days +20%g-%m-%d` #   -1   7  else OLDDATE=`date -d -7days +20%g-%m-%d` #   -7  fi /sbin/zfs snapshot $ZPOOL@$NOWDATE /sbin/zfs destroy $ZPOOL@$OLDDATE 2>/dev/null

This script is added to crontab for daily launch. To make the snapshot contents correspond to its date, the script should be run closer to the end of the day. For example, at 23:55.

The fourth day of the month was chosen almost by accident. I started all this third of August and I wanted to quickly make a backup, which will be stored for a year. The next day was the fourth.

Snapshots will be saved in the /mnt/backup/.zfs/snapshot directory. Each snapshot is a separate directory with the name in the form of a date at the moment of creating this snapshot. Inside the snapshot is a complete copy of the / mnt / backup directory as it was at that moment.

2. rsync on server

Traditionally, rsync is configured to run over ssh. On clients, authorization by keys is configured (and without a password), and these keys are added to the backup server. The server goes via ssh to clients and retrieves files from them. The advantage of this approach is traffic encryption. But I don’t like the idea of a ssh-free login (especially in light of the latest vulnerabilities in bash). I also don’t like the idea of initiating backup from the server side: sometimes I want to run a script on the client before backup (for example, reset the mysql dump), and only after this script is complete start backup. Therefore, my choice is rsync, started by the daemon on the server and launched from the crontab on the clients.

I put it on the rsync server (regular, from the repository), and in order to start it at system startup, I wrote in / etc / default / rsync:

 RSYNC_ENABLE=true

I created the following on the /etc/rsyncd.conf server:

 uid = nobody gid = nogroup use chroot = yes max connections = 10 pid file = /var/run/rsyncd.pid [kvm01] path = /mnt/backup/kvm01 comment = KVM01 backups hosts allow = 192.168.xxx.xxx hosts deny = * read only = no [kvm02] path = /mnt/backup/kvm02 comment = KVM02 backups hosts allow = 192.168.xxx.yyy hosts deny = * read only = no

192.168.xxx.xxx and 192.168.xxx.yyy are the addresses of those servers that will be backed up. Their names are kvm01 and kvm02. Their files will be in / mnt / backup / kvm01 and / mnt / backup / kvm02. Therefore:

 # mkdir /mnt/backup/kvm01 # mkdir /mnt/backup/kvm02 # chown nobody:nogroup /mnt/backup/kvm01 # chown nobody:nogroup /mnt/backup/kvm02

Run rsync:

 # /etc/init.d/rsync start

3. rsync on clients

The minimum required script for copying files from the kvm02 client to the server with the address 192.168.xxx.zzz will look something like this:

 #!/bin/bash RSYNCBACKUPDIR="rsync://192.168.xxx.zzz/kvm02" LOCALDIR="/virt/files" rsync -vrlptD --delete $LOCALDIR $RSYNCBACKUPDIR

Of course, if we are talking about backup of virtual machines, then this script should be replenished with commands to create and remove LVM snapshot, mount and unmount its contents, and so on. But this topic is already beyond the scope of this article.

4. Recovery

To restore the files from the KVM01 client backup for August 4, 2014, it will be enough on the server to go to the directory /mnt/backup/.zfs/snapshot/2014-08-04/kvm01/ and copy the files from there in any usual way. Each specific backup looks like a regular read-only directory. To search for a specific file in this backup, you can use standard utilities, such as find or grep.

5. Conclusion

Now there are 9 snapshots on the server: 7 daily and 2 monthly. Plus today's backup snapshot of which will be removed in the evening. The size of the partition with backups is 1.8T. The total file size is 3.06T. Physically they occupy 318G on the disk. The total volume of today's backup is 319G. Yes, 10 backups on ZFS with compression and deduplication take up less space than one backup would take on a file system without these useful properties.

 # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 1.80T 310G 1.49T 16% 10.37x ONLINE -

 # zfs list NAME USED AVAIL REFER MOUNTPOINT backup 3.06T 1.42T 318G /mnt/backup

Since rsync itself does not encrypt transmitted data, it is not safe to stick such a scheme out without changes to the Internet. You can add encryption by letting traffic through ipsec or stunnel, for example.

I wrote above that I had no noticeable problems with ZFS. In fact, one problem was. One night, when both clients were actively backing up, the server twice informed in the dmesg that task rsync was blocked for more than 120 seconds. In this case, both backups were successfully completed, nothing hung, the data was not lost. I suspect that this is a manifestation of the famous bug 12309 . Spread backups in time, since the problem has not been repeated.

Source: https://habr.com/ru/post/239513/

All Articles