📜 ⬆️ ⬇️

Fault tolerance in storage Qsan

Today in the IT infrastructure, with the widespread use of virtualization, data storage systems are the core of all virtual machines. Failure of this node is able to completely stop the work of the computer center. Although a considerable part of the server hardware has fault tolerance in one form or another “by default”, it is because of the special role of the storage system within the data center that it has increased requirements in terms of “survivability”.




The most effective method of ensuring fault tolerance in IT is the use of several copies of hardware and software (in the simplest case, duplication). Of course, storage can be fully replicated. And for disaster recovery it is this approach that is used. But not all companies can afford such a solution. We are talking not only about the double cost of equipment, but also about other expenses related to the organization of such a solution and its further support.


However, the possibility of duplication of equipment does not eliminate the need to ensure fault tolerance at the component level. In particular, in the storage system, redundancy is applied for power supply units, cooling modules, drives and, of course, controllers. All this has long become commonplace. It is difficult to find storage without the use of such a design. Qsan is no exception here. But in this article we want to talk about what is not immediately apparent, and at the same time it is aimed primarily at improving the resiliency of the system as a whole.


Cooling modules


Very often, in storage systems with 2U-3U enclosures, combined modules are used that combine power supplies and fans. On the one hand, it is convenient, because only one block needs to be serviced. On the other hand, if the cooling system fails, the power supply can be forcibly disconnected to avoid overheating. And it seems that not the most critical situation will arise, but it is clearly not worth adding security vulnerabilities.


Cooling in the Qsan storage system is organized as separate hot-swappable modules, independent of power supplies. Actually, in the power supply units there are their own fans, designed for blowing the PSUs themselves. The cooling module contains two independent fans that insure each other. There are two such modules in the storage system: right and left - for efficient blowout of all components. If one of the fans fails, all the others automatically increase their speed in order to compensate for the resulting lack of air flow. That is why the failure of the fan does not entail the danger of overheating of the entire device.


Expansion Shelf Connection Topology


The classical scheme of connecting expansion shelves to the storage system implies a topology called a cascade. In this case, the corresponding shelf controllers and storage systems are interconnected by a single SAS cable. Total it turns out 2 cables on dual-controller system. If you want to connect the second, then it is connected in the same way to the first shelf. And so on. The advantage of this topology is the ease of implementation in the equipment. But the disadvantage is some vulnerability to the sudden break of the SAS chain due to the cross-failure of the unmacked storage controllers and the shelf or due to the de-energization of one of the expansion shelves in the middle of the chain. The result will be a loss of access to part of the drives and the possible collapse of the RAID group, if it is “smeared” across multiple housings.


From the cross-failure of controllers, Qsan has protection in the form of an internal logical connection of controllers through the backplane storage system. Those. the storage controller sees not only the JBOD controller directly connected to it, but also the “neighbor” controller via a special link in the backplane. As a result, if such a situation occurs and no one physically pulls out the SAS cables between the storage system and the shelf, access to all drives will be preserved.



To protect against breaking the SAS circuit, for example, due to the de-energization of the expansion shelf, a different connection topology is usually used - the reverse cascade. In this case, the storage system is connected directly to the first and last shelf in the circuit, gaining access to the drives, as it were, from two sides.



If you want a stronger protection, you can build configurations larger, using, for example, tree topology. Or even complicate due to the combination of the mentioned topologies. This is possible due to the large number of SAS connectors on the devices (2 for each storage controller and 5 for each JBOD controller) with automatic detection of input / output operation modes. The main thing is that the administrator himself is not confused. And so the storage system will be able to properly configure.


Fast rebuild


The presence of backup drives in the system “hot” replacement (hot spare) significantly increases the reliability of information storage. However, the mere fact of allocating such disks does not mean absolute protection. The fact is that the recovery process (rebuild) is quite laborious and often time consuming. The complexity arises from the incessant access to basic data. Those. the system along with the current work also has to copy the data to the new disk. And the duration of the rebuild directly depends on the capacity of the drive and its speed characteristics. Since the system does not know anything about the actual occupied space on the disks, it simply copies everything in the rebuild process: block by block.


As a result, the restoration of a modern high-capacity disk of 10 + TB with a heavy load on the storage system can easily be a week or more. You should also keep in mind the fact that during the rebuild the probability of failure of other drives due to increased load on them is significantly increased. And this can already be a serious danger in the case of using, for example, RAID5.


As a solution to this problem, many storage developers have attended to speeding up the recovery process. Different approaches can be used for this, but the essence is the same - copying only real-time blocks during a rebuild. Qsan did not stay away from this problem. In the storage system of this vendor, when the Fast Rebuild option is activated, the system tracks the blocks used for recording, thereby having the option of copying them to the new drive in the event of a disk failure.



The Fast Rebuild option is not enabled by default when creating new volumes, since its use has an impact on performance, especially with random write operations, because:


  1. It is necessary to track the records in the blocks;
  2. When a rebuild does not occur, recalculation of checksums for unallocated space, therefore, with a new entry in this area, you must first "initialize" it.

Therefore, it is not recommended to use Fast Rebuild for volumes, for example, with high-loaded databases or in video surveillance systems, where the volume will still be 100% full. But for file or mail servers, this option will be just very useful.


Instead of conclusion


Each storage vendor implies that its devices are reliable. And if there are no fatal miscalculations in the development of devices and an incredible craving for savings in the process of their production and testing, then in general, you can agree with the vendor. However, you need to understand:



At the same time, you also should not forget that no absolute reliability of storage systems cancels the availability of backup copies, clear and rehearsed recovery plans in the event of an accident, and operational technical support for the vendor.


')

Source: https://habr.com/ru/post/459214/


All Articles