In this article, I'm going to discuss the availability of data stored in the clouds. The second (and most interesting) point of the program is the privacy and security of this data.

I want to note right away, in this part I am writing more about the theoretical part of the question than about the practical part. Suppose a user is going to set up to back up their data in the cloud. What potential problems may arise and how to avoid them? To begin with, it is worth repeating the obvious: why does the user want to use this type of data backup?
')
Motivation
First, cloud providers promise high availability of data (hereinafter, I will sometimes call this availability). Secondly, it is assumed that the data, even if they are not available at the current moment, is still saved somewhere and will be available later. This feature is called durability.
An additional important advantage is that the stored data will be accessible from anywhere in the world, would be a normal Internet access.
So, three main advertised benefits:
- High availability
- High durability
- Flexibility: access data from anywhere in the world with an internet connection
Are the claimed properties true?
Unfortunately, we do not live in a perfect world. All, even the most remarkable, technical tools and efforts used by providers to ensure the reliability and availability of data can be easily undone. The reasons may be the most diverse, but the probability of their appearance is very much. Various software errors, problems with the batch of purchased hardware, elementary human errors of personnel can lead to the fact that the data will be unavailable for an unacceptably large amount of time (ie, the availability will suffer). And these are flowers, because the complete deletion of data is also possible (in this case durability suffers).
In order not to be unsubstantiated, I will give several examples of incidents of recent years. Let's start with availability. The well-known Amazon S3 is a service with the stated availability of "three nines", which means the service can be offline for no more than 10 minutes a week. Two serious outage they had in 2008. In June, the service was
unavailable for 8 hours. In February, access
problems lasted about 3 hours. And these are not the only cases. The same service was
offline for the last time already in the past 2011.
The lack of access to the cloud service FlexiScale lasted a few days altogether. By the way, its cause was just the mistake of the employee. He accidentally deleted one of the data stores. Fortunately for FlexiScale users, the data was subsequently restored.
Users of Carbonite cloud storage were not so lucky. In 2009, the operator
lost the data of many customers. Carbonite in the incident blamed the suppliers of iron. Like it or not, for us is not so important. What is important is the fact of loss. The Linkup provider generally
ceased to exist after losing the data of most of its customers.
By the way, the fact that a provider may simply cease to exist along with your data is another potential threat. Especially if the data need to be stored for a very long time.
Cause
What is the main source of the problems described? In that one single provider is used. Of course, it’s very good to have redundancy at the server level used by the provider. And at the level of data stored in multiple copies. However, the fact that the cloud is controlled by one operator, one type of software and, for sure, is controlled centrally, it can negate all the above measures.
What to do?
The answer is obvious. If the availability of data, and their survival is not so important, do not worry. Or save another copy of the data locally.
If:
- data you want to store for a very long time
- there is no possibility to store them also locally
- for any other reason you need to store data in the network,
It is worth using multiple cloud data storage simultaneously. Something like an enticing picture to this post.
For this, it is necessary for this to use data redundancy, ie redundant data.
The easiest way to do this is to use data
replication .
Despite its simplicity and tradition, this method has several significant drawbacks. The first is that too much space is occupied on the servers. The second is that too much traffic is spent on saving data. Namely, saving data on servers (and updating them) is the most frequent operation when backing up data.
Fortunately, there are other approaches that allow you to save on both disk space and traffic, while not reducing the level of data availability.
I will write about them in detail in the next section. The third part will be devoted to security issues.