📜 ⬆️ ⬇️

Saving clouds


I don’t know how the others are, but personally, I’m worried about fashionable words and technologies such as virtualization, clouds and BigData. They are alarming because, without delving into their essence and applicability, people blindly begin to use them where necessary and not necessary, because “others do it!”, Eventually creating solutions in which there are more problems than solutions (for example, considering that gigabytes is big data, deploy Hadoop, instead of adding memory and using Excel). Therefore, I especially scrupulously bite into fashionable technologies, trying to understand their specifics and the area of ​​their applicability in order not to succumb to the very "hammer syndrome". In this article I want to share some of my thoughts about cloud storage, all the more so since we started it in Acronis several years ago.

Where does childhood go?

Somehow my brother, like any normal person, was going on vacation with his family. On the eve of departure, he came to the parental home with an external hard drive and said: "Let him lie down here." To the question "what is it and why do you need it?" pictures he would not want to lose. After all, money and equipment can be restored by the method of earnings. The lost pictures of children cannot be restored in any way due to the vile second law of thermodynamics - time in our world is irrevocably spent. This is the main function of the backup - to stop the flowing time. This is the value of backup - frozen time. But here an interesting moment arises: being valuable, the backup itself needs a backup! This is how the backup replication script comes about. But, as they say, it is pointless to put all the eggs in one basket. So backups should be dispersed in space. You can carry it to your parents, or you can go to the cloud. My brother showed me the first example of offsite backup, which I remember forever.

Flesh of the flesh, backup from the backup

So, if we truly value our data, we need backup copies of their backups. However, with an increase in the number of copies, we are faced with the problem that the malicious coder-copy-paster faces - the problem of agreement. For example, I somehow flew an external Western Digital disk with a backup of photos (after which I have a fierce distrust of WD). Since then, I have been copying pictures onto two disks, but this creates an additional burden for synchronizing these copies. Sticking and poking USB drives is not something that I would like to do for the rest of my life. It would be cool if someone backed up copies for me.
In the Acronis Backup & Recovery product, the product itself is able to replicate the created backup chains into additional storage. For example, this is how D2D2T (disk to disk to tape) or D2D2C (disk to disk to cloud) scripts are solved.
In Acronis True Image, backup redundancy is solved by backing up to the cloud. Indeed, in the Acronis cloud, as many as two, Reed and Solomon , replicate data from several storage nodes, so that if one node fails, no data is lost. On the details of the implementation of the cloud, I will somehow explain in more detail in another post.

Copy to cloud

So, backup to the cloud is the simplest solution for those who need redundancy. Let the data center itself reproduce the data, and our task will be minimal - deliver the data to the data center. But as it turns out, this task in itself is not so trivial. The following (often unexpected) factors seriously affect the way we can deliver data to the cloud:
  1. Geography
  2. Legislation
  3. Volume of copied data
  4. Data change rate
  5. Channel width
  6. Computer running time without rebooting
  7. The importance of data consistency

')
Geography and legislation
Given the global availability of the cloud, when you move data into it, the question arises - is it not against the state that you do this? For example, if the data center is located in Syria, users from the USA will not feel the security of their data as deeply as they would like. In the Old World, standards for the transfer of personal data are rooted in ancient Rome, so transferring on tapes under escort is often the only way to legally transport data from a bank to a bank, not to mention storing it somewhere.

Technical factors
If the previous factors are more likely to influence the business logic of backup to the cloud, allowing or not allowing users to choose one or other data centers, then the following five factors directly affect the technologies involved. Depending on a particular factor, various bottlenecks arise, to combat which one or another technology is called upon, which we will consider after a brief review of the factors.

Volume of copied data
The greater the volume, the greater the problem is its delivery anywhere. The process of copying itself is delayed in time, intensifying the actions of the frequency of changing data, the consistency of data, and the operating time of the computer without rebooting. Accordingly, for freedom of maneuver, one should try to manage with small volumes. If, of course, it is possible.

Data change rate
If the data did not change, then they and backup copies were not particularly needed (if you count the destruction of data as one of the forms of change). Depending on the frequency of the change, the question also arises whether it is important for the user to track each change or if some changes can be ignored. For example, when editing a DOC file, it may be interesting for the user to keep track of all saved versions of the document (some kind of constant high-level undo stack), while tracking the modification of each block of the database file is of little interest to anyone. The frequency of data changes has a direct effect on maintaining their consistency, which is especially painful on systems loaded with production with a snapshot created that duplicates I / O write operations.

Channel width
If the channel is wide (such as a processor bus or fast local network), then the thin spot is the read from the disk itself. Optimization consists in minimizing I / O disk operations, which is achieved by single-pass disk copying, or imaging. However, if the bottleneck is a communication channel, then disk I / O operations cease to play a key role and other factors come to the fore.

Time without rebooting
With large amounts of data and a slow channel backup can take a long time. One friend who dealt with BigData truly said that backing up analytical data for the quarter for the tapes took a month and a half. When downloading to the cloud, the twelve-hour processes are not uncommon, and the home user during this time may well want to turn off the computer. Accordingly, if no special measures are taken, the process of copying the entire volume of data is doomed to impossibility.

Data consistency
When backing up your system and applications, the consistency of the files that make up this system or application is of particular importance. Suppose the copy process lasts an hour. If we copy files simply by enumeration, we will have a situation in which the first file will be in the state of one time, and the last file in a state an hour later. And if during this hour the last file managed to change (which is quite a norm for loaded systems), then we receive in the backup copy a state that never existed in nature and, therefore, does not have to be working. For example, the registry may contain links to files that no longer exist and other horrors of the dispersed system. There are, however, backup scenarios where data consistency is not important. For example, each photo has a self-sufficient value and does not depend on the status of neighboring photos.

Cloud scripts

Having considered the factors influencing the used technologies and scenarios, we will consider how they combine and what it results in.

Synchronization
If the consistency of data is not important, then the most convenient solution for their protection is synchronization. The high-level synchronization scheme is as follows:
  1. An agent is installed on all nodes to be synchronized.
  2. An agent is assigned a certain area for tracking (folder, set of folders, whole volume)
  3. The agent tracks the changes and immediately uploads them to the cloud if it is available or postpones until a connection appears.
  4. The agent subscribes to events in the cloud and plays them locally.
  5. Versioning of files due to backup occurs in the cloud itself.


Synchronization is suitable for most home users, as they deal mainly with independent data - documents and photos. These files, plus everything else, are not large, so they have enough time to download to the cloud or pump out from there during the user session with the computer. That is why the market for the protection of individual files in the user segment is gradually giving way to synchronization.

System backup
Users who have spent more than one hour setting up their system, installing favorite applications there, understand that time is money. Even if all these applications are recoverable from distribution disks in principle, the time spent on the configuration cannot be undone. Therefore, there is a need to protect the system - to create a backup of their efforts. In this perspective, the system configuration is likened to the writing of a document. Only much more complex and large. This scenario is especially vivid in the business segment, where recovery time directly affects financial losses. So we get a scenario of recovery after failures - the ability to quickly get an image of the system in working condition. In this scenario, data consistency plays an important role, and the volumes are larger than in the case of synchronization. If the channel were wide, as in the case of backup to a local disk, for example, then sector-by-image removal from the snapshot would be the simplest and cheapest solution. However, the channel in the cloud is not wide and not reliable.

With such a channel, uploading hundreds of gigabytes takes hours, or even days. With a large volume you can fight incremental copies, but from the primary large volume can not escape. If corporate clients can still tolerate such a time, the home user will turn off his computer on the first day. Patient corporate clients, too, will not be particularly happy for a few days to keep an active snapshot (oh God, bring the word snapshot into the Russian language) of the system, doubling I / O operations. This problem has two solutions:
  1. Initial seeding
  2. Resume

Initial download
The essence of this solution is to transfer this primary data volume to the data processing center not via a slow and unreliable channel like HTTP, but via a fast (for large volumes) and reliable channel like real physical mail (reliable if it is not Russian Post) . That is, the user copies his backup copy to a real disk and sends it via DHL type mail to the data center. There, it is unpacked, and after that the user copies the small changes in his data to the cloud incrementally over the usual WAN. The fact that the physical channel can be quite fast is confirmed by the well-known joke, which illustrates the insane bandwidth of the channel, loading KamAZ with hard disks and speeding it up to 60 km / h. By the way, when my brother brought the drive to the parental home, he essentially did this very initial seeding. Since sending and tracking disks is quite an energy-intensive task, and there are a lot of home users (Acronis has more than five million of them), this decision in the prosyumersky segment did not suit Acronis purely organizationally. In the corporate segment, it suited where it is successfully applied.


Resume
Resuming is the ability to interrupt the backup in half and continue it later without serious additional overhead. This is where sector-by-sector copying fails. After all, a half-loaded disk is almost the same thing as a half-time pregnancy. The disc is downloaded either completely or not downloaded at all. Instant snapshots (snapshots), used for the consistent state of the copied disk, do not survive the computer restart, so you need a mechanism for more granular data copying without losing the image of the machine on bare hardware.

Hybrid image removal

In True Image 2014 for the first time in Acronis (and in the world!) Hybrid image capture is implemented. The algorithm is as follows:
  1. Take a snapshot of the file system
  2. The image removal technology copies an empty volume with basic boot information (this volume does not exceed 200 Kb)
  3. Files from the snapshot are incrementally copied to the cloud

If the connection is interrupted in the middle, the backup is considered incomplete, but files already uploaded to the cloud are available for download and recovery. On the next resume, if the files have not changed (and the hashes tell us about this), they will not be uploaded and will be taken into account when building an incremental backup. Resume implemented!

By the way, due to the fact that we have an image of a volume with boot information, it is possible, in addition to restoring individual files, to roll back up to bare hardware with bootable media. Acronis in his role!

Russia and the cloud

Given all the above, the question arises: are these buns available to Russian home users? The answer to those who read marketing materials is not. Let's understand why not, before we look for ways to try it in Russia. The fact is that today our cloud storage facilities are deployed in the United States (St. Louis and Boston) and Europe (Strasbourg), and we don’t have a data center in Russia yet.

Purposeful copying of data from Russian users outside the Russian Federation meets with legal acts and restrictions, so that before giving such an opportunity, our just emerging cyber laws should be thoroughly investigated. One way or another, it is rumored that nothing illegal in our legislation has been found, and one of the upcoming updates to the European cloud will also be available to Russian home users. Well, there, if there are a lot of users, perhaps their data will be migrated to their homeland!
As for those home users of True Image 2014, there are no technical limitations, and who still wants to try the cloud, while in Russia, can set up some other European country (for example, Poland) in their online account and use the cloud. Acronis Backup & Recovery 11.5 corporate users do not need any special squats - everything works out of the box.

Conclusion

Source: https://habr.com/ru/post/196566/


All Articles