Backup, Part 1: Purpose, Overview of Methods and Technologies

Why do you need to make backup copies? After all, the equipment is very, very reliable, besides, there are “clouds” that are more reliable than physical servers: if properly configured, the “cloud” server will easily survive the failure of the infrastructure physical server, and from the point of view of service users, there will be a small, barely noticeable leap in time service. In addition, duplication of information often requires paying for “extra” processor time, disk load, and network traffic.

The ideal program works fast, does not flow through the RAM, has no holes and does not exist.

-Unknown

Since programs are still being written by protein developers, and the testing process is often absent, plus the delivery of programs rarely happens using the “best practices” (which are programs themselves, and therefore not ideal), system administrators often have to solve problems that sound like briefly, but succinctly: “return, as it was,” “bring the base to normal operation”, “slowly work - roll back”, as well as my favorite “I don’t know what, but fix it”.

In addition to logical errors that emerge as a result of negligent work by developers, or coincidences, as well as incomplete knowledge or misunderstanding of the minor features of building programs — including connecting and systemic ones, including operating systems, drivers, and firmware — there are also other errors. For example, most developers rely on runtime, completely forgetting about the physical laws, which are still impossible to circumvent with programs. This is the infinite reliability of the disk subsystem and in general of any storage subsystem (including RAM and processor cache!), And zero processing time on the processor, and no errors during transmission over the network and during processing on the processor, and network delays that are 0. Do not neglect the notorious deadline, because if you do not have time for it - there will be problems cleaner of the nuances of the network and the disk.
')
$Chef, it's all gone! Based on the \ f Diamond Hand$

How to deal with problems that arise in full growth and hang over valuable data? Live developers have nothing to replace, and not the fact that it will be possible in the near future. On the other hand, to fully prove that the program will work as intended, so far only a few projects have turned out, and it is not at all necessary to take and apply evidence to other, similar projects. Also, such evidence takes a lot of time, and requires special skills and knowledge, and this practically minimizes the possibility of using them, taking into account the deadlines. In addition, we still do not know how to ultra-fast, cheap and infinitely reliable technology for storing, processing and transmitting information. Such technologies, if they exist, in the form of concepts, or - most often - only in fantastic books and films.

Good artists copy, great artists steal.

—Pablo Picasso.

The most successful solutions and surprisingly simple things usually occur where absolutely incompatible at first glance concepts, technologies, knowledge, and fields of science meet.

For example, birds and planes have wings, but despite the functional similarity - the principle of operation in some modes is the same, and technical problems are solved in the same way: hollow bones, the use of durable and lightweight materials, etc. - the results are completely different, although very similar. The best patterns that we see in our technology are also, for the most part, borrowed from nature: hermetic compartments in ships and submarines are a direct analogy to the ringed worms; building raid arrays and checking data integrity - duplication of the DNA chain; as well as paired organs, the independence of the work of various organs from the central nervous system (automatic heart work) and reflexes are autonomous systems on the Internet. Of course, taking and applying ready-made solutions “in the forehead” is fraught with problems, but who knows, maybe there are no other solutions.

To know where you fall - straws would spread!

—Belarusian proverb

So backups are vital for those who wish:

Have the ability to restore the work of their systems with minimal downtime, and even without them
It is safe to act, because in case of an error there is always the possibility of a rollback
Minimize the effects of intentional data corruption.

Here is a bit of theory.

Any classification is arbitrary. Nature does not classify. We classify it because it is more convenient for us. And we classify according to the data that we take also arbitrarily.

—Jan Bryuler

Regardless of the physical storage method, logical data storage can be divided into 2 ways to access this data: block and file. Such a division has recently become very blurry, since purely block, as well as purely file, logical storages do not exist. However, for simplicity, we assume that they are.

Block data storage implies that there is a physical device where data is recorded in some fixed chunks, blocks. Access to the blocks goes to a certain address, each block has its own address within the device.

A backup is usually made by copying data blocks. To ensure data integrity at the time of copying, the recording of new blocks is suspended, as well as the modification of existing ones. If you take an analogy from the ordinary world - the closest closet with the same numbered cells.

Block data storage

File storage of data on the principle of a logical device is close to block and often organized on top. Important differences are the existence of a storage hierarchy and intelligible names. There is an abstraction in the form of a file - a named data area, as well as a directory - a special file that stores descriptions and access to other files. Files can be supplied with additional metadata: creation time, access flags, etc. They usually reserve it this way: they look for modified files, then copy them to another file storage structure identical in structure. Data integrity is usually implemented by the absence of files to which it is written. File metadata is backed up in the same way. The closest analogy is a library in which there are sections with different books, and also there is a catalog with familiar names of books.

Recently, another option has been described, from which, in principle, file storage began, and which has the same archaic features: object data storage.

It differs from file storage in that it does not have more than one nesting (flat scheme), and file names, although human-readable, are still more suitable for processing by machines. When backing up, object storages are most often treated like file storage, but occasionally there are other options.

- There are two types of system administrators, those who do not make backup copies, and those who already do.
- In fact, there are three types: there are still those who verify that backups can be restored.

-Unknown

You should also understand that the data backup process itself is carried out by programs, so all the same disadvantages as another program are inherent in it. In order to remove (not exclude!) The dependence on the human factor, as well as the peculiarities - which individually do not greatly influence, but together can give a tangible effect - apply the so-called. rule 3-2-1. There are many options for deciphering it, but I prefer the following interpretation: you need to store 3 sets of the same data, 2 sets need to be stored in different formats, and 1 set must be stored in a geographically remote storage.

Under the storage format should be understood as follows:

If there is a dependence on the physical method of storage - change the physical method.
If there is a dependence on a logical storage method, we change the logical method.

To achieve the maximum effect of rule 3-2-1, it is recommended to change the storage format in both ways.

From the point of view of backup readiness for its intended purpose - to restore functionality, - there are “hot” and “cold” backups. Hot ones from cold ones differ only in one thing: they are immediately ready for work, while cold ones for recovery require some additional actions: decryption, retrieval from the archive, etc.

Do not confuse hot and cold copies with online and offline copies, which imply physical isolation of data, and in fact are another sign of the classification of backup methods. So offline copy - not connected directly to the system, where it needs to be restored - can be both hot and cold (in terms of readiness for restoration). An online copy can be accessed directly where it needs to be restored, and most often is hot, but there are also cold ones.

In addition, do not forget that the process of creating backups usually does not end with the creation of a single backup, and there may be a sufficiently large number of copies. Therefore, it is necessary to distinguish between full backups, i.e. those that are recoverable independently of other backups, as well as differential (incremental, differential, decremental, etc.) copies — those that cannot be restored on their own and require prior restoration of one or more other backups.

Differential incremental backups - an attempt to save the amount of space for storing backups. Thus, only changed data from the previous backup is written to the backup.

Differential decrements are created for the same purpose, but in a slightly different way: a full backup is made, but only the difference between the fresh copy and the previous one is actually stored.

We should also consider the backup process on top of the storage, which supports the lack of storage of duplicates. Thus, if you write full backups on top of it, only the difference between backups will actually be recorded, however, the process of restoring backups will be similar to restoring from a full copy and is completely transparent.

Quis custodiet ipsos custodes?

(Who will guard the guards themselves? - lat.)

It is very unpleasant when there are no backups, but it is much worse if the backup seems to be made, but during restoration it turns out that it cannot be restored, because:

The integrity of the source data was broken.
Backup storage is damaged.
Recovery works very slowly, you can not use data that is partially restored.

A well-designed backup process must take into account such comments, especially the first two.

The integrity of the source data can be guaranteed in several ways. The most commonly used are the following: a) creating file system nuggets at the block level, b) freezing the state of the file system, c) a particular block device with version storage, d) sequential recording of files or blocks. Checksums are also used to ensure data validation during recovery.

Storage damage can also be detected using checksums. An additional method is the use of specialized devices, or file systems in which you cannot change the already recorded data, but you can add new ones.

To speed up recovery, data recovery with multiple processes is used for recovery - provided that there is no bottleneck in the form of a slow network or a slow disk system. In order to get around the situation with partially recovered data, you can split the backup process into relatively small subtasks, each of which is performed separately. Thus, it becomes possible to consistently restore performance with the prediction of recovery time. This problem most often lies in the organization plane (SLA), so we will not dwell on this in detail.

He knows a lot about spices not the one who adds them to every dish, but he who never adds anything extra to it.

-AT. Sinyavsky

The practice in terms of the software used by system administrators may vary, but the general principles are the same, one way or another, the same, in particular:

It is highly recommended to use ready-made solutions.
Programs should work predictably, i.e. There should be no undocumented features or bottlenecks.
Setting up each program should be so simple that you don’t have to read the manual or the cheat sheet every time.
If possible, the solution should be universal, since servers in their hardware characteristics can vary very, very.

To make backups from block devices, there are the following common programs:

dd, familiar to veterans of system administration, here are similar programs (the same dd_rescue, for example).
Embedded into some file systems are utility programs (utilities) that create a dump (dump) of the file system.
Omnivorous utilities; for example, partclone.
Own, often proprietary, decisions; for example, NortonGhost and later.

For file systems, the backup task is partially solved using the methods applicable for block devices, but the task can be solved more efficiently, for example:

Rsync, universal program and protocol for file system state synchronization.
Built-in archiving tools (ZFS).
Third-party archiving tools; The most popular representative is tar. There are others, for example, dar - the replacement of tar with a focus on modern systems.

We should also mention the software to ensure the consistency of data when creating backups. Most often, the following options are used:

Mounting the file system in read-only mode (ReadOnly), or freezing the file system (freeze) - the method is restricted.
Creating nuggets of file system or block device (LVM, ZFS).
The use of third-party tools for the organization of the cast, even in cases where the previous paragraphs can not be provided for any reason (programs like hotcopy).
The copying technique when changing (CopyOnWrite), however, it is most often tied to the used file system (BTRFS, ZFS).

So, for a small server, you need to provide a backup scheme that meets the following requirements:

Easy to use - no special additional actions are required when working, minimal steps to create and restore copies.
Universal - works on both large and small servers; This is important with the increase in the number of servers or scaling.
It is installed by the package manager, or in one or two teams of the “download and unpack” type.
Stable - a standard or long-established storage format is used.
Fast in work.

Applicants from those who more or less meet the requirements:

rdiff-backup
rsnapshot
burp
duplicati
duplicity
deja dup
dar
zbackup
restic
borgbackup

A virtual machine (based on XenServer) with the following characteristics will be used as a test bench:

4 cores 2.5 GHz,
16 GB of RAM,
50 GB hybrid storage (storage with caching on SSD in 20% of the size of a virtual disk) as a separate virtual disk without markup,
200 Mbps channel to the Internet.

Almost the same machine will be used as the backup destination server, only with a 500 GB hard drive.

The operating system is Centos 7 x64: the breakdown is standard, an additional partition will be used as a data source.

As the source data, take a website on wordpress, with media files of 40 GB in size, and a database on mysql. Since virtual servers differ greatly in their characteristics, as well as for better reproducibility, there are

server test results using sysbench.

sysbench --threads = 4 --time = 30 --cpu-max-prime = 20000 cpu run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Prime numbers limit: 20000

Initializing worker threads ...

Threads started!

CPU speed:
events per second: 836.69

Throughput:
events / s (eps): 836.6908
time elapsed: 30.0039s
total number of events: 25104

Latency (ms):
min: 2.38
avg: 4.78
max: 22.39
95th percentile: 10.46
sum: 119923.64

Threads fairness:
events (avg / stddev): 6276.0000 / 13.91
execution time (avg / stddev): 29.9809 / 0.01

sysbench --threads = 4 --time = 30 --memory-block-size = 1K --memory-scope = global --memory-total-size = 100G --memory-oper = read memory run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: read
scope: global

Initializing worker threads ...

Threads started!

Total operations: 50900446 (1696677.10 per second)

49707.47 MiB transferred (1656.91 MiB / sec)

Throughput:
events / s (eps): 1696677.1017
time elapsed: 30.0001s
total number of events: 50900446

Latency (ms):
min: 0.00
avg: 0.00
max: 24.01
95th percentile: 0.00
sum: 39106.74

Threads fairness:
events (avg / stddev): 12725111.5000 / 137775.15
execution time (avg / stddev): 9.7767 / 0.10

sysbench --threads = 4 --time = 30 --memory-block-size = 1K --memory-scope = global --memory-total-size = 100G --memory-oper = write memory run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global

Initializing worker threads ...

Threads started!

Total operations: 35910413 (1197008.62 per second)

35068.76 MiB transferred (1168.95 MiB / sec)

Throughput:
events / s (eps): 1197008.6179
time elapsed: 30.0001s
total number of events: 35910413

Latency (ms):
min: 0.00
avg: 0.00
max: 16.90
95th percentile: 0.00
sum: 43604.83

Threads fairness:
events (avg / stddev): 8977603.2500 / 233905.84
execution time (avg / stddev): 10.9012 / 0.41

sysbench --threads = 4 --file-test-mode = rndrw --time = 60 --file-block-size = 4K --file-total-size = 1G fileio run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 8MiB each
1GiB total file size
Block size 4KiB
Number of IO requests: 0
Read / Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync () each 100 requests.
Calling fsync () at the end of test, Enabled.
Using synchronous I / O mode
Doing random r / w test
Initializing worker threads ...

Threads started!

Throughput:
read: IOPS = 3868.21 15.11 MiB / s (15.84 MB / s)
write: IOPS = 2578.83 10.07 MiB / s (10.56 MB / s)
fsync: IOPS = 8226.98

Latency (ms):
min: 0.00
avg: 0.27
max: 18.01
95th percentile: 1.08
sum: 238469.45

This note begins a big

cycle of articles about backup

Backup, Part 1: Why do I need backup, review of methods, technologies
Backup, part 2: Review and test rsync-based backup tools
Backup, part 3: Review and test duplicity, duplicaty, deja dup
Backup, part 4: zbackup, restic, borgbackup review and testing
Backup, Part 5: Bacula and veeam backup for linux testing
Backup Part 6: Comparing Backup Tools
Backup, Part 7: Conclusions

Source: https://habr.com/ru/post/449282/

All Articles

Backup, Part 1: Purpose, Overview of Methods and Technologies

More articles: