The feasibility and benefits of using server drives, building RAID arrays, is it worth saving and when?

A large number of drives of various speeds from various manufacturers are available on the market. Not everyone clearly understands which disk is better to buy and for which task and why it is sometimes better to pay more, and when you can save. In this article I will try to clarify the main points and make the problem of choice more simple. The article will be useful not only for those who want to buy / rent a dedicated server, but also for those who want to get a reliable storage of information at home. After reading the material, it becomes clear why it is not always advisable to rent desktop solutions in low-cost data centers and it is better to opt for a more reliable server hardware.

To begin with, all drives available on the market can be clearly divided into classes:

- disks for usual desktop (used in home PCs, laptops and desktop servers of low-cost data centers);
- server disks with a speed of 7200 revolutions per minute (RPM);
- Enterprise drives with speeds of 10,000 and 15,000 RPM;
- solid state drives.
')
We will consider the features of choosing solid-state drives in a separate article, and now we’ll focus mainly on hard drives and consider which drive where and when it is advisable to use.

Let's start with the usual drives for the PC. These are great drives with a fairly large capacity and good performance, but their main drawback is that they are not designed to work in a RAID array due to their design features. In these discs, the vibrations caused by the spindle rotation are practically not compensated for in any way. Of course, these vibrations are minimal and in the case of using 1-2 discs at home, they are not a problem. However, if we consider the server case when there are many disks, the influence of vibrations can be quite significant, since mutual vibrations occur, resonance enhances the effect. So, when 12 disks are installed in the casing at once, and powerful server fans run at 5000-9000 revolutions per minute, the vibration level increases quite significantly, and with them the% of errors and losses, which has a negative effect on performance. The performance of desktop-type disks drops in these cases many times, as they experience considerable difficulties with positioning the heads and lose track. This can be seen well from the popular performance-vs.-vibration load graph:

Another thing is SATA RE (RAID Edition) drives or server drives with a speed of 7200 RPM. They are less susceptible to vibrations and are less dependent on them. As you can see from the graph, the probability of an error resulting from vibrations is 50% lower for them.

But not only vibrations are a problem, the other main problem of all disks is the level of unrecoverable errors. What does this mean in practice?

For SATA PC drives, unrecoverable error level is 1 error per 10 ¹⁴ bits, or 1 error per 12.5 TB of data. The 1TB disc has 1000 / 12500x10 ¹⁴ bits. 5 disks have a capacity of 5x (1000 / 12500x10 ¹⁴ ) bits, and the probability of an error occurring when these disks are working in a RAID5 array will be (5x (1000 / 12500x10 ¹⁴ )) / 10 ¹⁴ x100% = 40%.

As you can see, it is simply impossible to use 5 PC disks in RAID5, since the probability of an unrecoverable error occurring during a rebuild is very high and the rebuild will fail rather quickly. In this way, we will get an array that will obviously fail in the case of a rebuild and the data will be lost. I did not know about this feature before, and in 2008, when I collected my first server on PC-shnyh drives, I built a RAID5 array, in order to save disk space and money, and in less than a month, the data was lost . Now I am surprised that the array lived for so long :)

Of course, you can use more reliable RAID levels, such as RAID10 or, in extreme cases, RAID6, but with a large number of disks, we will also get a rather high degree of probability of an unrecoverable error during a rebuild.

Another thing is server disks with a speed of 7200 revolutions per minute (RPM) SATA RE or Near Line (NL) SAS disks. The probability of an unrecoverable error for them is an order of magnitude less already due to their technical features, 1 error occurs for 10 ¹⁵ bits of data. However, when using not only a large number of drives, but also large-capacity drives, this may not be enough and in such cases you will still have to use Enterprise-class SAS drives, the reliability level of which is 1 unrecoverable error per 10 ¹⁶ data bits.

It is also worth noting that, in fact, for SATA RE, Near Line (NL) SAS drives and SAS Enterprise-class drives, in fact, drives that can effectively interact with a RAID controller, the probability of an unrecoverable error is much less, just account of this ability. So, when working with a loaded array (databases with which many users work at once, active writing and reading data), already recoverable errors start to play the role with which regular disks work inefficiently. They try to re-read the problem many times - in the same Western Digital, the value is set to 64 passes of the head with different parameters of height, angle, only after that the head goes on to processing other tasks. Due to this, the waiting time greatly increases, which RAID does not tolerate and will certainly consider the disk lost and will try to restore the disk, as a result of which the load on the array will become critical, since at the same time as the workload will also go rebuild. The result is predictable - the collapse of the entire array.

Disks that can work with RAID can inform the RAID controller that there is a problem with reading a data block, request this block from other disks and at that time process other requests, and after receiving the block, overwrite it elsewhere in the problem disk. Due to this, there is no drop in the performance of the RAID array and the probability of data loss is significantly reduced. However, it should be noted that not all soft-raid controllers installed on chipsets are able to “understand” such disks, because sometimes it is not enough to have RE disks for a reliable array, but you still need to use a hardware controller or another platform that works correctly with RAID.

However, if you want to collect more reliable storage than storage on PC drives, you can buy cheaper disks than RE disks, for example, Constellation CS, which are designed to work exclusively with software-based raids and lack the lack of desktop (multiple rereading attempts). data to the detriment of other tasks), while fully, of course, they do not interact with the controllers, so RAID crashes are not completely excluded.

Regardless of which drive you use, you should also remember that the disks have a cache of 32, 64 MB and more. What does this mean for a RAID array? In terms of performance, cache is a plus, both for reading and writing. However, in terms of recording reliability, this is a minus. Using the cache, the raid controller will think that it has already written data to the array, but in fact it can only be in the cache, and the disk will be recorded later. Depending on the size of the array, the size of the total cache grows, and in the case of 12 drives, the cache is already almost a gigabyte. What happens to the data when the power is turned off? Right. They will be lost. And if we are talking about a file manager, this is probably not so critical, but if we are talking about databases, it will be fun. Therefore, it is recommended for data of special criticality, such as databases, to disable the write cache. This will reduce disk performance by 8–15% in database mode, but will significantly increase reliability. For this reason, if you purchase a large-capacity data warehouse, large manufacturers disable the default cache there and cannot enable it. Applying the same drives in the servers, especially in the low-cost data center, where the power to the server is not reserved, you need to remember about this risk and take it into account.

We also note another key feature of SAS Enterprise-class disks, data is stored even more reliably on them, since the minimum cluster size is 520 bytes, not 512, another 8 bytes are added for parity checking. A large number of data recovery algorithms are used without a controller. For this reason, the volume of these discs is not very large.

By the way, at the expense of volume, the extreme recommendation, if you have the task to store data securely, do not try to use disks of larger volume than is necessary, since in the case of a rebuild, recovery will take longer. As a rule, controllers do not analyze how much is actually occupied on the disk and restore the entire disk as a whole, therefore the difference in recovery time between 1 TB and 6 TB will be more than 6 times.

Let's sum up. Based on the foregoing, it is clear that for a small RAID array, the use of the most expensive Enterprise class drives does not matter and does not give any advantage in reliability. However, the use of server disks is highly desirable, since in this variant there is an order of magnitude greater likelihood that the rebuild will complete successfully. Do not use larger discs than necessary, except when you need to provide higher performance for IOPS (in some discs, there may still be a gain in speed due to a larger number of heads and plates). In cases when a large amount and a lot of disks are required and at the same time a sufficient level of reliability, you can look towards SAS NL, which are essentially a modified version of SATA RE drives due to the SAS interface, but still have the same 7200 RPM. To increase the level of reliability, it is advisable to use a higher level RAID. When the volume of the array is not critical and requires maximum reliability, you need to uniquely apply the SAS 15000 RPM Enterprise.

Now, choosing to rent a server in the Netherlands, we have on the Switch site, using the configurator, located at the bottom of the page http://www.ua-hosting.company/servers , or by modifying one of the specials. offers:

There comes an understanding of which disks and which server is better to use and for which tasks, when it is better to use disks in RAID, and when separately, distributing files with software depending on popularity (balancer script depending on load). Why 4 disks of larger volume, in terms of reliability, can be better than 12 smaller ones, but worse in terms of recovery time in the case of rebuilds. And the most important thing is why our offer is really cool for the server segment and we really brought the price closer to the desktop sites, while maintaining an order of magnitude higher reliability without exaggeration! So if you, or your friends need a good server - welcome, the sale of some configurations from the list below is limited, very soon the prices for these configurations will be higher, although we are generous, but not unlimited :):

Yes, if someone has real experience in using those or other drives for certain tasks - feel free to share them in the comments. Everything is interesting, right down to the bounce statistics. On this topic, as well as about the problems of choosing an SSD-drive, we will try to publish the material later.

Source: https://habr.com/ru/post/305334/

All Articles

The feasibility and benefits of using server drives, building RAID arrays, is it worth saving and when?

More articles: