Backup Backup Comparison

Preparing a new server for work should begin with setting up a backup. All, it would seem, know about it - but sometimes even experienced system administrators make unforgivable mistakes. And the point here is not only that the task of setting up a new server should be solved very quickly, but also that it is not always clear what backup method should be used.

Of course, it’s impossible to create an ideal way that would suit everyone, there are pros and cons everywhere. But at the same time, it seems quite realistic to choose a method that is most suitable for the specifics of a particular project.
')
When choosing a backup method, you must first pay attention to the following criteria:

The speed (time) of backup in storage;
The speed (time) of recovery from backup;
How many copies can be kept with a limited amount of storage (backup storage server);
The amount of risk due to inconsistency of backups, the lack of smoothness of the method for performing backups, the complete or partial loss of backups;
Overhead: the level of load created on the server when copying, reducing the speed of service response, etc.
The cost of renting all used services.

In this article, we will discuss the basic ways of backing up servers running Linux systems and the most common problems that new users may encounter in this very important area of system administration.

The scheme of the organization of storage and recovery from backup

When choosing a scheme for organizing a backup method, you should pay attention to the following basic points:

Backups cannot be stored in the same place as the backed up data. If you store a backup on one disk array with your data, then you will lose it in case of damage to the main disk array.
Mirroring (RAID1) cannot be compared with backup. The raid only protects you from a hardware problem with one of the disks (and sooner or later there will be such a problem, since the disk subsystem is almost always a server bottleneck). In addition, when using hardware raids there is a risk of a controller breaking, i.e. it is necessary to keep its spare model.
If you keep backups in the same rack in the DC or just in the same DC, then in this situation there are also certain risks (you can read about it, for example, here .
If you keep backups in different DCs, then the network costs and the speed of recovery from a deleted copy dramatically increase.

Often the cause of data recovery is file system or disk corruption. Those. backups need to be stored somewhere on a separate server storage. In this case, the problem may be the “width” of the data transmission channel. If you have a dedicated server, it is highly desirable to perform backups on a separate network interface, and not on the same one that exchanges data with clients. Otherwise, your client's requests may not fit into a limited communication channel. Or, due to client traffic, backups will not be made on time.

Data transfer

Next, you need to think about the scheme and data recovery time in terms of storing backups. It may be quite satisfactory for you that backup is performed in 6 hours at night on a storage with limited access speed, but recovery is not enough for 6 hours. So access to backups should be convenient and data should be copied quickly enough. For example, the recovery of 1TB of data with a 1Gb / s band will take almost 3 hours, and this is if you do not “resist” the performance of the disk subsystem in the storage and server. And do not forget to add to this the time of detection of the problem, the time to decide on rollback, the time to check the integrity of the recovered data and the amount of subsequent dissatisfaction of customers / colleagues.

Incremental backup

In incremental backups, only files that have been modified since the previous backup are copied. The subsequent incremental backup adds only files that have changed since the previous one. On average, incremental backups take less time because fewer files are copied. However, the data recovery process takes more time, since the data of the last full backup should be restored, plus the data of all subsequent incremental backups. At the same time, unlike differential copying, changed or new files do not replace old ones, but are added to the media independently.

Incremental backups are most often done using the rsync utility. With its help, you can save space in the storage if the number of changes per day is not very large. If the modified files are large, they will be copied completely without replacing previous versions.

The backup process using rsync can be divided into the following steps:

A list of files on the server being reserved and in the storage is compiled; for each file, metadata (rights, modification time, etc.) or checksum (when using the —checksum key) is read.
If the file metadata differs, the file is beaten into blocks and a checksum is considered for each block. Different blocks are pumped into the repository.
If during the calculation of checksums or file transfer a change was made to it, its reservation is repeated from the beginning.
By default, rsync transfers data via SSH, which means that each data block is additionally encrypted. Rsync can also be run as a daemon and transmit data without encryption over its protocol.

For more information on how rsync works, see the official site .

For each rsync file performs a very large number of operations. If there are many files on the server or if the processor is heavily loaded, the backup speed will be significantly reduced.

From experience we can say that problems on SATA disks (RAID1) begin after approximately 200G of data on the server. In fact, everything, the final one, depends on the number of inodes. And in each case, this value may shift in one way or the other.

After a certain point, the backup time will be very long or simply will not work for a day.

In order not to compare all files, there is lsyncd. This daemon collects information about changed files, i.e. we will already have a list ready for rsync in advance. However, it should be noted that it gives an additional load to the disk subsystem.

Differential backups

With differential backup, every file that has been modified since the last full backup is backed up every time. Differential copying speeds up the recovery process. All you need is the last full and last differential backup. The popularity of differential backup is growing, since all copies of files are made at certain points in time, which, for example, is very important when infected with viruses.

Differential backups are done, for example, with a utility like rdiff-backup. When working with this utility, the same problems arise as with incremental backup.

In general, if a full file search is performed when searching for data differences, problems of this kind of backup are similar to problems with rsync.

We would like to separately note that if in your backup scheme each file is copied separately, then you should delete / exclude files you do not need. For example, it can be CMS caches. In such caches, there are usually a lot of small files, the loss of which will not affect the correct operation of the server.

Full backup

Full backup usually affects your entire system and all files. Weekly, monthly and quarterly backups imply the creation of a full copy of all data. It is usually done on Fridays or during weekends, when copying a large amount of data does not affect the operation of the organization. Subsequent backups, running Monday through Thursday until the next full backup, may be differential or incremental, mainly to save time and space on the media. Full backups should be done at least weekly.

In most publications on the relevant topics it is recommended to perform a full backup once or twice a week, and the rest of the time - to use incremental and differential. In such councils there is a reason. In most cases, a full backup once a week is enough. It makes sense to re-execute it if you don’t have the opportunity on the storage side to update the full backup and to guarantee the correctness of the backup (this may be necessary, for example, in cases where you don’t trust your existing scripts for one reason or another or software for backup.

In fact, a full backup can be divided into 2 parts:

Full backup at the file system level;
Full device level backup.

Consider their characteristic features on the example:

 root @ komarov: ~ # df -h
 Filesystem Size Used Avail Use% Mounted on
 / dev / mapper / komarov_system-root 3.4G 808M 2.4G 25% /
 / dev / mapper / komarov_system-home 931G 439G 493G 48% / home
 udev 383M 4.0K 383M 1% / dev
 tmpfs 107M 104K 107M 1% / run
 tmpfs 531M 0 531M 0% / tmp
 none 5.0M 0 5.0M 0% / run / lock
 none 531M 0 531M 0% / run / shm
 / dev / xvda1 138M 22M 109M 17% / boot

We will only reserve / home. Everything else can be quickly restored manually. You can also deploy the server by the configuration management system and connect our / home to it.

Full file system level backup

Typical representative: dump.

The utility creates a "dump" of the file system. You can create not only a full, but an incremental backup. dump works with the inode table and "understands" the file structure (so, sparse files are compressed).
Creating a dump of a working file system is “stupid and dangerous,” because the file system can change during the creation of a dump. It should be created from snapshot (a bit later we will discuss the features of working with snapshots in more detail), unmounted or frozen filesystem.

Such a scheme also depends on the number of files, and its execution time will increase with increasing amount of data on the disk. At the same time, dump speed is higher than rsync.
In case you need to renew not the backup copy as a whole, but, for example, only a couple of accidentally damaged files), the restore utility can take too long to extract these files

Full device level backup

mdraid and DRBD
In fact, RAID1 is configured with a disk / raid on the server and a network drive, and from time to time (in terms of the frequency of backups), the additional disk is synchronized with the main disk / raid on the server.

The biggest plus is speed. The duration of synchronization depends only on the number of changes made on the last day.
Such a backup system is used quite often, but few people are aware that the backups obtained with its help may be incapable, and here's why. When disk synchronization is complete, the backup disk is disabled. If we have, for example, running a DBMS that writes data to the local disk in batches, storing intermediate data in the cache, there is no guarantee that they will fall onto the backup drive at all. At best, we will lose some of the variable data. Therefore, such backups can hardly be considered reliable.
LVM + dd
Snapshots are a great tool for creating consistent backups. Before creating a snapshot, you need to reset the cache of the file system and your software to the disk subsystem.

For example, with one MySQL it will look like this:

 $ sudo mysql -e 'FLUSH TABLES WITH READ LOCK;'
 $ sudo mysql -e 'FLUSH LOGS;'
 $ sudo sync
 $ sudo lvcreate -s -pr -l100% free -n% s_backup / dev / vg /% s
 $ sudo mysql -e 'UNLOCK TABLES;'

* Colleagues tell stories how someone “read lock” sometimes led to deadlocks, but in my memory, this has never happened before.

Then you can copy snapshot in storage. The main thing is to ensure that during copying the snapshot does not self-destruct and not to forget that when creating snapshots, the recording speed will drop significantly.

Backups of the DBMS can be created separately (for example, using binary logs), thereby eliminating the idle time for the cache reset time. And you can create dumps in the repository by running the instance DBMS there. Backup of different DBMS is a subject for separate publications.

You can copy snapshots using a resume (for example, rsync with a patch to copy block devices bugzilla.redhat.com/show_bug.cgi?id=494313 ), by blocks and without encryption (netcat, ftp). You can transfer blocks in compressed form and mount them in storage using AVFS, and mount a server partition with SMB backups.

Compression eliminates the problems of transmission speed, channel clogging and storage space. But, however, if you do not use AVFS in the storage, then it will take a long time to restore only a portion of the data. If you use AVFS, you will come across its “dampness”.
An alternative to block compression is squashfs: you can mount, for example, a Samba partition to the server and execute mksquashfs, but this utility also works with files, i.e. depends on their quantity.

In addition, when creating a squashfs, a lot of RAM is spent, which can easily lead to a call to the oom-killer.

Security

You need to protect yourself from the situation when the storage or your server will be hacked. If a server is hacked, then it is better that the user who writes data there has no right to delete / change files in the repository.
If the storage is hacked, then the backup user rights on the server should also be limited to the maximum.

If the backup channel can be heard, then encryption is needed.

Conclusion

Each backup system has its own disadvantages and advantages. In this article we tried to highlight some of the nuances when choosing a backup system. We hope that they will help our readers.

As a result, when choosing a backup system for your project, you need to conduct tests of the selected backup type and pay attention to:

backup time at the current stage of the project;
backup time in case the data will be many times longer;
channel load;
load on the disk subsystem on the server and in the storage;
recovery time of all data;
recovery time of a pair of files;
the need for data consistency, especially database;
memory usage and availability of oom-killer calls;

As backup solutions, you can use supload and our cloud storage .
Readers who can not leave comments here are invited to our blog .

Source: https://habr.com/ru/post/226831/

All Articles