Any classification is arbitrary. Nature does not classify. We classify it because it is more convenient for us. And we classify according to the data that we take also arbitrarily.
—Jan Bryuler
Regardless of the physical storage method, logical data storage can be divided into 2 ways to access this data: block and file. Such a division has recently become very blurry, since purely block, as well as purely file, logical storages do not exist. However, for simplicity, we assume that they are.
Block data storage implies that there is a physical device where data is recorded in some fixed chunks, blocks. Access to the blocks goes to a certain address, each block has its own address within the device.
A backup is usually made by copying data blocks. To ensure data integrity at the time of copying, the recording of new blocks is suspended, as well as the modification of existing ones. If you take an analogy from the ordinary world - the closest closet with the same numbered cells.

File storage of data on the principle of a logical device is close to block and often organized on top. Important differences are the existence of a storage hierarchy and intelligible names. There is an abstraction in the form of a file - a named data area, as well as a directory - a special file that stores descriptions and access to other files. Files can be supplied with additional metadata: creation time, access flags, etc. They usually reserve it this way: they look for modified files, then copy them to another file storage structure identical in structure. Data integrity is usually implemented by the absence of files to which it is written. File metadata is backed up in the same way. The closest analogy is a library in which there are sections with different books, and also there is a catalog with familiar names of books.

Recently, another option has been described, from which, in principle, file storage began, and which has the same archaic features: object data storage.
It differs from file storage in that it does not have more than one nesting (flat scheme), and file names, although human-readable, are still more suitable for processing by machines. When backing up, object storages are most often treated like file storage, but occasionally there are other options.
- There are two types of system administrators, those who do not make backup copies, and those who already do.
- In fact, there are three types: there are still those who verify that backups can be restored.
-Unknown
You should also understand that the data backup process itself is carried out by programs, so all the same disadvantages as another program are inherent in it. In order to remove (not exclude!) The dependence on the human factor, as well as the peculiarities - which individually do not greatly influence, but together can give a tangible effect - apply the so-called. rule 3-2-1. There are many options for deciphering it, but I prefer the following interpretation: you need to store 3 sets of the same data, 2 sets need to be stored in different formats, and 1 set must be stored in a geographically remote storage.
Under the storage format should be understood as follows:
- If there is a dependence on the physical method of storage - change the physical method.
- If there is a dependence on a logical storage method, we change the logical method.
To achieve the maximum effect of rule 3-2-1, it is recommended to change the storage format in both ways.
From the point of view of backup readiness for its intended purpose - to restore functionality, - there are “hot” and “cold” backups. Hot ones from cold ones differ only in one thing: they are immediately ready for work, while cold ones for recovery require some additional actions: decryption, retrieval from the archive, etc.
Do not confuse hot and cold copies with online and offline copies, which imply physical isolation of data, and in fact are another sign of the classification of backup methods. So offline copy - not connected directly to the system, where it needs to be restored - can be both hot and cold (in terms of readiness for restoration). An online copy can be accessed directly where it needs to be restored, and most often is hot, but there are also cold ones.
In addition, do not forget that the process of creating backups usually does not end with the creation of a single backup, and there may be a sufficiently large number of copies. Therefore, it is necessary to distinguish between full backups, i.e. those that are recoverable independently of other backups, as well as differential (incremental, differential, decremental, etc.) copies — those that cannot be restored on their own and require prior restoration of one or more other backups.
Differential incremental backups - an attempt to save the amount of space for storing backups. Thus, only changed data from the previous backup is written to the backup.
Differential decrements are created for the same purpose, but in a slightly different way: a full backup is made, but only the difference between the fresh copy and the previous one is actually stored.
We should also consider the backup process on top of the storage, which supports the lack of storage of duplicates. Thus, if you write full backups on top of it, only the difference between backups will actually be recorded, however, the process of restoring backups will be similar to restoring from a full copy and is completely transparent.
Quis custodiet ipsos custodes?
(Who will guard the guards themselves? - lat.)
It is very unpleasant when there are no backups, but it is much worse if the backup seems to be made, but during restoration it turns out that it cannot be restored, because:
- The integrity of the source data was broken.
- Backup storage is damaged.
- Recovery works very slowly, you can not use data that is partially restored.
A well-designed backup process must take into account such comments, especially the first two.
The integrity of the source data can be guaranteed in several ways. The most commonly used are the following: a) creating file system nuggets at the block level, b) freezing the state of the file system, c) a particular block device with version storage, d) sequential recording of files or blocks. Checksums are also used to ensure data validation during recovery.
Storage damage can also be detected using checksums. An additional method is the use of specialized devices, or file systems in which you cannot change the already recorded data, but you can add new ones.
To speed up recovery, data recovery with multiple processes is used for recovery - provided that there is no bottleneck in the form of a slow network or a slow disk system. In order to get around the situation with partially recovered data, you can split the backup process into relatively small subtasks, each of which is performed separately. Thus, it becomes possible to consistently restore performance with the prediction of recovery time. This problem most often lies in the organization plane (SLA), so we will not dwell on this in detail.
He knows a lot about spices not the one who adds them to every dish, but he who never adds anything extra to it.
-AT. Sinyavsky
The practice in terms of the software used by system administrators may vary, but the general principles are the same, one way or another, the same, in particular:
- It is highly recommended to use ready-made solutions.
- Programs should work predictably, i.e. There should be no undocumented features or bottlenecks.
- Setting up each program should be so simple that you don’t have to read the manual or the cheat sheet every time.
- If possible, the solution should be universal, since servers in their hardware characteristics can vary very, very.
To make backups from block devices, there are the following common programs:
- dd, familiar to veterans of system administration, here are similar programs (the same dd_rescue, for example).
- Embedded into some file systems are utility programs (utilities) that create a dump (dump) of the file system.
- Omnivorous utilities; for example, partclone.
- Own, often proprietary, decisions; for example, NortonGhost and later.
For file systems, the backup task is partially solved using the methods applicable for block devices, but the task can be solved more efficiently, for example:
- Rsync, universal program and protocol for file system state synchronization.
- Built-in archiving tools (ZFS).
- Third-party archiving tools; The most popular representative is tar. There are others, for example, dar - the replacement of tar with a focus on modern systems.
We should also mention the software to ensure the consistency of data when creating backups. Most often, the following options are used:
- Mounting the file system in read-only mode (ReadOnly), or freezing the file system (freeze) - the method is restricted.
- Creating nuggets of file system or block device (LVM, ZFS).
- The use of third-party tools for the organization of the cast, even in cases where the previous paragraphs can not be provided for any reason (programs like hotcopy).
- The copying technique when changing (CopyOnWrite), however, it is most often tied to the used file system (BTRFS, ZFS).