Recommendations for backup and recovery after the End of the World policy

On the Day of the End of the World, it is appropriate to recall what should be the policy of backup and recovery of data after failures and disasters.

When significant cataclysms such as Hurricane Sandy and a flood in New York occur, companies remember their “insurance”: was there a backup, was it lost along with the original data, was it possible to restore applications and data from it, was it covered by the process backup all or part of a productive system, and how long will it take to restore?

The answers to these questions may be different depending on how the company initially treated its “insurance”: whether the data protection project was well thought out and funded, whether the backup process was included in a comprehensive business continuity plan, or was a “sliver” incomplete mosaic.
')
Some companies do not pay enough attention to modeling threats to data and applications in their infrastructure, do not test backups, do not check the ability to restore the system while respecting the SLA during the recovery time (RTO).

Often, small and medium-sized businesses (SMBs) do not have a sufficiently reliable plan to ensure the resiliency of their systems. Some consider it too complicated and costly for themselves. Others simply have “no time” to think about the reliability of storing their data. Still others believe that nothing bad can happen to the data.

And even if a backup product is installed in the company, often no one is testing backups to check if the system can be restored from them if a failure occurs.

Now there are a large number of backup solutions with different functionality and price on the market, and the problem for SMB companies is to develop a rational data protection strategy for themselves and find a product that would be simple, productive and reasonable in price. The need for somewhere to place the backup repository data also adds the problem of choosing the appropriate hardware storage systems.

So what can you advise when developing a backup strategy?

Backup Disk-Disk, with block changes

Usually, the Disk-Disk (D2D) backup operation is faster, and the data recovery operation is much faster than the Disk-tape operation. Block-level backups are much faster than file-level backups, as changes are tracked at a much smaller portion of data. As a result, the backup window is reduced. If a company is required to comply with any legal requirements that require long-term storage, it is reasonable to combine the Disk-Disk (D2D) and Disk-Tape (D2T) approaches and use the Disk-Disk-Tape (D2D2T) configuration, storing backup copies in within (say) 30 days on disk storage, and long-term storage on tape (as the specific cost of storage on a tape has no equal so far, despite the decrease in the specific cost of storage on disk data stores). Additionally, the advantages of this approach can be found in this and in this post.

Deduplication backups

Deduplication is the process of eliminating duplicate blocks from backup data. In a virtualization environment, deduplication is especially effective because many virtual machines are created in a single pattern and contain an identical set of software. Read more about backup deduplication here .

If you use virtualization, use specialized backup products.

There are “old” backup products created for the backup of physical machines. They install special agent programs inside the machines and copy data from inside, usually at the file level of the corresponding operating system. There are "new" products, created specifically for virtualization systems, and using new technological capabilities of virtual environments. They do not require agents, they are less burdening the server virtualization and work on much faster due to the use of new technologies such as the vStorage API . Read more about the benefits of using specialized backup products here .

Replication, or backup-schedule with a minimum period

How long can new or modified data from a productive system not be reserved? This time period is called RPO. And modern backup products have made good progress in minimizing the RPO, reaching up to 15 minutes. The use of replication in general allows one to get a mode close to continuous continuous data backup (near-CDP).

Backing up on the principle of "copy everything except ..."

The copying strategy of only selective user data “do not copy anything, except for explicitly included objects” can lead to the fact that important user data will not be fully reserved. For example, this can happen when the user has asked to include his working directory into the backup plan, and then created other working directories outside the subtree of the first directory. The same can happen when a user installs a new program that will save the user's working files to a directory that has not been configured as being backed up. Or in the new version of the program, the storage directory of the user's working files will be changed to a location that, again, does not back up ...

Fast data recovery

Failures occur not only because of the causes of disasters or power supply and equipment failures. Up to a third of cases occur in human errors (for example, accidental deletion of a file). For this reason, it is often necessary to restore not the entire server or the entire network segment, but only an individual file or an application object. You need to use backup products that support granular data recovery. This significantly reduces recovery time and simplifies the entire process for the administrator. More information about granular recovery can be found here .

Offsite backup storage

It is reasonable to keep backup copies of data separately (and, if possible, as far as possible) from the original data, since there is a possibility that the event that led to the failure may destroy both the original data and their copies. There are two most common options for cloning a backup copy obtained after copying Disk-Disk:

storing additional copies on tapes (which can be taken to another office, or to a bank cell) - this scheme is called D2D2T and is the most common at the moment,
duplicate copies to the cloud (Cloud), for example, in Amazon.

In addition, you can take the path of creating a hot recovery site — that is, creating a clone of infrastructure in another office and constantly replicate data and application configurations between offices. In the event of a failure, users switch to a backup site.

Virtualization of critical applications (or their redundant cluster nodes)

Here I would just like to note that the use of virtualization in itself allows to facilitate the recovery process after failures, since it isolates the virtual machines from the specific equipment. A virtual machine can be run on a virtualization server with any hardware that does not necessarily match the one on which it was running before the crash. Many virtualization systems have the functionality to move a virtual machine between hosts in the event of a failure, as well as fault tolerance and clustering functionality.

Backup Testing

You should conduct data recovery testing regularly, rather than simply testing the integrity of your backups. That is, to check the operability of the restored systems and the presence of correct data in them, and not just to check the checksums of the backup file data blocks. In the case when the productive network is in a virtual environment, backup products created specifically for the virtual environment make the data verification process automated and transparent for the administrator, as they allow you to create virtual sandbox test laboratories isolated from the production network. Read more about this in this post . About technology SureBackup can be read here .

Backup Information Security

The source data is usually protected by access rights. After transferring this data to backup copies, you need to ensure that only authorized access will be available to them. There are several options: for example, you can encrypt backups, make them inaccessible to unauthorized network users, or even exclude network access from the network segment where regular network users work (administrators can connect from another network segment).

Why in the case of virtualization backup at the host level reduces the risks of information security, you can see in this post .

***

Additional material on backup strategies in a virtual environment

Site Backup Academy (English)
Expert article on network backup with hundreds of virtual machines
Article Backup and Restore for Microsoft Hyper-V
An article about backing up virtual machines on networks with NAS (English)

Source: https://habr.com/ru/post/163405/

All Articles