Why is RAID-5 a mustdie?

Recently, quite a lot of articles on the topic “Why RAID-5 is Bad” began to appear in the world computer press (example, one , two , and others )

I will try, without diving into the engineering and terminological jungle, to explain why, until now, RAID-5 seemed to work, but now it has suddenly ceased.
')

The capacity of hard drives over the past few years is growing without any particular tendency to stop. However, although the capacity of the disks almost doubles every year, the increase in their speed, that is, the data transfer speed, increases in percent over the same period. Yes, indeed, SATA, SATA-II interfaces appear on the disks, and we are already waiting for SATA-III, but whether the disks began to work faster, and not just received a new interface with bells and new round figures of theoretical indicators of the " maximum speed figure on the speedometer" Zaporozhets ?

Practice tells us that - no .
If we compare performance, especially on small random operations, for mass SATA disks over several years, we will see that there is no noticeable performance increase comparable to volume growth.
Capacity - is growing at times, but the speed is not.

When RAID-5 appeared, in 1987, the typical hard disk was 21MB in size, and had a rotation speed of 3600 RPM. Today, a typical SATA drive is 1TB, that is, the increase in capacity was 50 thousand times! But the rotation speed at the same time only doubled.
If the data transfer rate over the years would grow at the same pace as the capacity, today's drives would have data transfer rates of around 30 gigabytes per second.

Now let's remember what RAID is, and its implementation is RAID-5.
RAID, or Redundant Array of Independent Disks, is a disk group organization model in a fault-tolerant structure so that it maintains the availability of information even in the event of a damage or complete failure of some of these disks.
Among the many types of RAID described “in theory”, in nature there are basically three. This is a RAID-0 (or “interleaved group” which “RAID” is actually only conditional, since it does not have fault tolerance, which is indicated by the number 0), RAID-5 , or “interleaved and parity group”, and RAID -1 , or "mirror". In its pure form, RAID-1 is practically not used due to speed limitations, so its combination with RAID-0 is used in high-performance arrays. As a result of this alliance, RAID-0 gets fault tolerance, and RAID-1 gains speed. Typically, this combination is called RAID-0 + 1 or RAID-10 , or “interleaving with mirroring.”

RAID-10 is good for many. Yes, almost everyone. And reliability, and speed, except for the fact that its creation takes 50% of the entire capacity of the disks, half. Pretty gangster percentage.
It is this rather cruel percentage that often forces users of servers and storage systems to choose as an alternative to RAID-5.

Indeed, in RAID-5 we pay for fault tolerance with a capacity of only one disk, that is, the capacity of a RAID-5 is (n-1) * hddsize, where n is the number of disks and hddsize is their size.
The data is “spread out” across all disks in the RAID group, their blocks are supplemented with service information, which makes it possible to recover data loss in the amount of any single disk, and this service information does not occupy any dedicated disk, but just a part of the volume of this group. equal to just the capacity of a single disk. But it is also spread over all disks.

When one of the RAID-5 group disks fails (full or partial), the RAID group goes into the degraded state, but our data remains available, since the missing part of them can be restored due to the redundant information of that “additional volume, the size of one disk. " The truth is usually the speed of the disk group drops sharply, because when reading and writing, additional operations of calculating redundancy and restoring data integrity are performed. If we insert a failed new disk instead of a failed one, the smart RAID controller will start the rebuild procedure, “rebuild”, for which it will start reading the remaining data from all the disks and, based on the redundant information, will fill the new, previously empty missing disk, missing together with a dead part of the disc.

If you have not encountered the RAID-5 rebuild process, you may be unpleasantly amazed at how long this process can be. This duration depends on many factors, and, besides the number of disks in the RAID group, and their fullness, which obviously largely depends on the capacity of the RAID controller's processor and the performance of the disk for reading / writing. And also on the workload on the disk array during the rebuild, and on the priority of the rebuild process compared to the priority of the workload.
If you are not lucky enough to lose a disk at the height of the working day or working week, then the rebuild process, and so slow, can be extended tenfold.
And with the release of more and more capacious disks, the performance levels of which, as we remember, almost do not grow, in comparison with the capacity, the rebuild time grows at an alarming rate, because, as it was written above, the speed of reading from disks, on which the speed directly depends the passage of a rebild, grows much slower than the capacity of the disks and the volume that needs to be considered.

So, on the Internet, you can easily find stories when a relatively small 4-6 disk RAID-5 out of 500GB disks was recovering data to a new disk in 24 hours or more.

Source: Adaptec

"A RAID 5 array with 500 GB SATA drives for approximately 24 hours to rebuild" Source:

"The SATA disks used for a 3.5TB array are configured as RAID 5 ... 3ware took a workout.) Source:

“I'm now at 80% of the rebuilding my RAID-5 array with 3x 1TB harddrives, I’ve calculated that I’ve got 66 hours!” Source:

“On my filer I run a software raid 5 across eight 500 GB sata drives, which works great ... Recovery time is about 20 hours. Athlon X2 4200+ and nvidia chipset. " Source:

Using the same terabyte and double terabyte disks, these figures can be easily multiplied by 2-4 times!

And here begins the passion.
The fact of the matter is, and it must be soberly understood, that for the time of a RAID-5 rebuild, you are not just with a RAID without fault tolerance. You get a RAID-0 rebuild for the entire time , the ~~reliability and fault tolerance of which is less than the reliability and fault tolerance of one disk n times, where n is the number of disks in a group.~~
(I decided to remove the openly controversial provisions of the article :) I’m happy to accept help from a competent “probable” mathematician in the correct calculation of reliability indicators, however, the main message of unreliability RAID-0 does not change this)
In the case of any failure, even the smallest, even, perhaps, not a complete disk failure, but simply a reading failure due to interference, or problems with cables, you lose all the information on it.

Let's say.
But the current drives look pretty reliable, right? Already a day of rebuild, they stretch without a glitch, not everything is so bad, and we are not as failures as to have two consecutive disks in our hands. This happens, but can carry?

This is what the vendors themselves say about the reliability of the disks.
(Summary table for the main series of disks)

Currently, almost all manufacturers produce hard drives of two main classes.
These are the so-called Desktop drives, for desktop systems, and Enterprise drives, intended for servers and other critical cases. In addition, Enterprise class drives are also divided into SATA (7200RPM) and SAS or FC (with 10K and 15K RPM speeds).

The reliability of the data transfer process is usually measured by the parameter BER - Bit Error Rate (Ratio) . This is the probability of failure, at the rate of a certain amount of bits read by disk heads.
As a rule, Desktop-class disks have a BER value of 10 ^ 14 degrees specified by the manufacturer, gradually for all large disks, especially new series, indicate reliability values of 10 ^ 15. This number means that the manufacturer predicts the probability of failure when reading no worse than one bad bit per 10 ^ 14 degrees of disk read bits. A unit with 14 zeros. One hundred thousand billion bits.
The figure is huge, it would seem. But is it really so great?

Simple calc.exe math tells us that 10 ^ 14 bits is only about 11TB of data. This means that the manufacturer of hard drives tells us in such a way that if we read from the disk with the BER parameter 10 ^ 14, that is, the usual desktop class of the disk, approximately 11TB, we, from the manufacturer's point of view, will certainly get a faulty bit somewhere. At least he, the manufacturer, is counting on it.
A bad read bit means a bad block, the size of which is 512 bytes. And off we go.
11 terabytes is this not so much already?

And this does not mean that it is necessary to read exactly 11TB, BER is only a probability that tends to 100% to the 11th terabyte. On smaller volumes, it simply decreases proportionally.
Yes, disks with a BER equal to 10 ^ 15 have an error probability 10 times better (110TB read per bad bit), but this is only a temporary improvement. As we remember, the capacity of disks doubles with each new generation, that is, approximately every one and a half to two years, RAID capacities grow, and BER10 ^ 15 for SATA is reached only in the last year or one and a half.

So, for example, for a 6-disk RAID-5 with 1TB disks, the failure rate due to BER is estimated at 4-5%, and for 4TB disks it will already reach 16-20%.

Source: Hitachi Data Systems: Why growing business need RAID-6.
This cold figure means that with a 16-20 percent chance you will get a disk failure during a rebuild (and, therefore, lose all data on the RAID). After all, for a rebuild, as a rule, the RAID controller will have to read all the disks in the RAID group, for 6 disks of 1TB, the data flow read from the RAID controller from the disks reaches 6TB, for 4TB it will already be equal to 24TB.
24TB is, with a BER of 10 ^ 15, a quarter of 110TB.

But even that is not all.
As practice shows, approximately 70-80% of the data stored on disks is the so-called cold data. These are files that are relatively rarely accessed. With increasing disk capacity, their volume in absolute terms also increases. A huge amount of data is often untouched by anyone, even an antivirus (why should it check gigabyte rips and mp3?), For months, and possibly for years.
The data error that occurred on the cold data array is detected only during the complete reading of the disk contents, on the rebuild process.
Large and “smart” storage systems are usually constantly engaged in idle seconds with so-called disk scrubbing, constantly reading and controlling read characteristics for the entire volume of disks. But I am sure that your low-cost “homemade” RAID controller does not.
Consequently, you will learn about the bad block appearing a week ago somewhere in the cold data space at the moment when you cross your fingers with a sighing watch on the progress bar of the rebuild process.

That's the unpleasant truth behind the somewhat scandalous articles about the "death of RAID-5."
It is possible that for the archive of ~~porno~~ home video collection, losing it in a matter of seconds will not be such a big disaster, especially if you are good at yourself. But it is definitely time to abandon RAID-5 on slightly more critical tasks than “the home storage of BD rips pumped from the torrent”.

Conclusions (for those who niasilil):

The sharp increase in disk volumes, with a much slower increase in data transfer rates from disk, has led to the fact that the recovery time for RAID-5 has drastically lengthened and continues to grow with the release of ever more capacious disks. As a result, the time period when the data is completely unprotected is not acceptable.
The lack of controls for cold data areas (rarely updated and readable data) in low-cost RAID-5 controllers can lead to the discovery of a long-standing readout problem at a critical time during a RAID rebuild, when it is completely unprotected from failures, and will result in complete data loss.
Increased load on the disks during the recovery period potentially increases the likelihood of a failure.
Modern Desktop-class disks have already approached in terms of volumes the parameters of the BER (Bit Error Rate) parameter determined by their manufacturers, which further increases the likelihood of failure during the massive reading of the entire disk volume.

All of the above proves the need to stop using RAID-5 as a fault-tolerant solution for storing important and critical data.

Decision:

For data, the speed of access (and especially the records) to which is not so important - RAID-6. RAID type failsafe two drives. When a single disk fails, protecting for the time of a rebuild from random read errors during recovery. The disadvantage is the relatively low write speed.
For data that requires the fastest possible access to both write and read, RAID-10. When using RAID-10, the rebuild time is drastically reduced, since it does not require reading the full volume of RAID, but only copying the contents of the “mirror” to the failed disk. Disadvantages - high disk consumption for fault tolerance.
If possible, do not save, using Desktop-class disks that are not intended for use in RAID to store critical information, but use special server-based Enterprise-series, the reliability of reading of which is one or two orders of magnitude higher.

Source: https://habr.com/ru/post/78311/

All Articles

Why is RAID-5 a mustdie?

More articles: