An attempt to design a storage for 3-10 Pbytes (containing 3000-10,000 hard drives) showed a flaw in the design of OpenStack Swift. As it turned out, OpenStack Swift (perhaps other similar systems) does not scale indefinitely and wastefully uses equipment. When using a large number of disks (more than 3000), data loss in almost inevitable.
The guys from Openstack offer to create 3 copies (by default). Where does this figure come from? Why not 2, 4 or 5: They do not provide any method for calculating reliability.
Where did the idea of 3 copies come from
In general, (3-2-1 rule) 3 copies are used considering the following circumstances:
1. The first copy (working) is built on iron with high performance. This is the main repository. Most operations take place in a working copy.
2. The second copy is geographically divided with the first. Iron is not so productive. Data is replicated asynchronously. In the event of the failure of the main storage, the second copy assumes the load.
3. The third copy protects from stupid mistakes (it must be made by other administrators, use excellent software and hardware from the first 2 copies).
That is, the principle of "do not put all the eggs in one basket."
In the case of Swift, all data is in the same place, the same software is used. Nevertheless, the emphasis is placed on 3 copies (they did not test versions with a different number).
')
Paradox of copies
If we have 3 copies in the storage (200% redundancy), it means that you can lose 2 disks - while the data is saved guaranteed. In the case of a small number of disks, this is the only possible construction scheme. But to extend it to a large number of disks is a stupid waste of resources. More advanced methods (error correction) than simple copying should be used.
The dual parity scheme (RAID6) also protects against the death of 2 disks, while the redundancy lies in the range of 20-40%. There are other storage redundancy algorithms, but for some reason OpenStack does not use them.
Suppose in the storage of 3000 disks. According to statistics from
Google , for a year of operation 1-5% of disks will fail. The event of failure of 3 disks at the same time becomes commonplace, and should happen several times a year (3-6 times, omit the calculations). If all copies are scattered on the disks by accident, as is done in Swift, then surely some of the files will be in three dead disks.
Increasing the number of copies looks absurd. 300%? 400%?
Cleversafe
This company has more thoughtfully approached the construction of storage facilities, and offers the project as much as
10EB using Reed-Solomon codes for reliable storage. Their redundancy is 35% - and this is at 4,500,000 disks!
The bottom line
- OpenStack is trying to unwind through the big word "cloud" and the support of "Rackspace and NASA". Other famous companies are joining this project. Although all may not be so rosy. Infinite scalability is not currently.
- Three or more copies can be used to increase storage performance. So Facebook uses HDFS and storage in 30 PB. However, who will announce the information about the lost data?
- CleverSafe says that their code is GPL (4 years ago). Maybe someone left the source?
More detailed calculations in my
article in English .