📜 ⬆️ ⬇️

Advantages of a new backup method for virtual machines over classic schemes

image

We understand under the cut.


The usual direct incremental method is usually set by default, and therefore it is more commonly used. It is based on the fact that the first run creates a full backup and then the chain of subsequent increments is saved. In order to increase the reliability of such a backup chain and reduce the recovery time (it will grow linearly as the number of increments created), periodically it is necessary to create either a new full backup or a synthetic one. The number of increments through which you need to re-create a full backup is indicated in the parameters of the backup scheme. Schematically, the process looks like this:

direct incremental

The direct method provides high speed data processing (I / O), as it requires only one read / write operation for each saved data block. The increment creation time and the “life” time of the virtual machine snapshot are small, which minimizes the load on production. However, the storage capacity consumption will be significant due to the storage of an excess amount of data. Why?
')
In practice, as a rule, companies set the retention policy of the backup (governing the number of available recovery points (full copies and increments) or the calendar time of storage. In this case, the direct backup scheme must satisfy two conditions:
  1. The backup chain must be recoverable (i.e. include a full backup and all subsequent increments. If you delete part of the chain, you will not be able to restore data from such a backup)
  2. The number of available recovery points should always be no less than the number set by the user.

Suppose the specified storage period is 7 days. Suppose you have already created a complete chain of 7 recovery points, the next full backup, and, let's say, a couple more increments to it. Can I delete the first thread? No - if you delete it, only 3 restore points will remain, and this contradicts paragraph 2 above. It turns out that you can get rid of obsolete recovery points no earlier than 14 days - hence, excessive storage.

Reversible incremental method allows to avoid overrun of disk space. The mechanism for creating such backups is a bit more complicated: “fresh” increments are embedded in the original created full backup, and the data blocks that were replaced in this way from the full copy are preserved as they precede it.

reversible incremental

The reverse incremental method, firstly, allows to increase the efficiency of the storage system due to the fact that there is always one full backup and a chain of preceding increments (the “extra” increments are regularly removed according to the established storage period). Secondly, the recovery time of the data from the backup created by the reverse method is minimal, since the full copy contains the most current version of the data and there is no need to waste time analyzing the increments.

However, this algorithm also has its own “but”: data processing speed decreases and the snapshot lifetime increases. For each saved data block, 3 read / write operations are required: read the displaced data block from the full copy, write this block on the storage system in the form of a reverse increment, and then enter the new block of changed data into the complete copy. As a result, if the storage system does not support this level of performance, the backup process will take a long time, and snapshot will increase the workload of the production environment.

How to avoid compromises


Veeam Backup & Replication v8 implements the method of “direct incremental-infinite” backup, which combines the strengths of the algorithms discussed above, and allows you to immediately get both data copy speed, fast recovery, and economical use of storage.

With the direct incremental-infinite method, a complete copy and a chain of subsequent increments are created, which are stored until the specified storage period is reached (let it be N days). On day N, the last increment of the chain is recorded, and the following run of the backup task will result in the following:







Over time, such an operation will be repeated time after time as new increments are added to the chain.

The total number of read / write cycles will remain the same as with reverse incremental backups, however, it is important how the data will be processed. To create an increment, only one I / O operation is required, which means that the snapshot of the virtual machine will be opened less time. The remaining 2 read / write operations are needed in order to update the full backup file, and snapshot is no longer involved. In addition, the process of creating a new full synthetic backup will be reduced to adding one increment, instead of combining a whole chain of increments, as would be the case if creating a “direct incremental” with full synthetic copies. The procedure of “collapsing” the oldest increment with a full copy will occur already outside the backup window without load on the production environment, which means that “in the window” you can manage to make more backup copies (in a synthetic backup, the blocks are combined in one window with backup copies) ).

PS


More clearly, all the above algorithms are shown in Veeam KB-1799 :

Source: https://habr.com/ru/post/242983/


All Articles