📜 ⬆️ ⬇️

We increase the disk mass without steroids. Overview of the Western Digital Ultrastar Data102 102 Disk Shelf and Storage Configuration



What are great jbods good for?


The new JBOD Western Digital on a disk of 102TB turned out to be powerful. In developing this JBOD, previous experience with two generations of 60 disk shelves was taken into account.
Data102 turned out to be rare for such giants balanced in volume and performance.

Why do we need such large disc baskets when the popularity of hyper-convergent systems grows in the world?

Tasks in which the requirements for storage volumes significantly exceed the requirements for computing power, can inflate the customer's budget to incredible sizes. Here are just a few examples and scenarios:
')
  1. Replication Factor 2 or 3, used in the construction of Scale-out systems, on several petabytes of data is an expensive solution.
  2. Intensive sequential read / write operations cause the cluster node to go beyond the local storage, which can lead to problems such as long-tail latency. In this case, you should be extremely careful when building a network.
  3. Distributed systems do an excellent job with tasks like “many applications work with many of their files” and mediocre with writing and reading from a strongly connected cluster, especially in the N-to-1 mode.
  4. With tasks like “increase the depth of the video archive by 2 times” it is much cheaper to throw a big JBOD than to increase the number of servers in the cluster by 2 times.
  5. Using external storage systems with JBOD, we can clearly allocate volume and performance for our priority applications by reserving certain disks, caches, ports for them, while maintaining the necessary level of flexibility and scalability.

As a rule, disk shelves of the Data102 level are developed by disk manufacturers who understand well how to work with these disks and know all the pitfalls. In such devices, everything is fine with the level of vibration and cooling, and the power consumption corresponds to the real data storage needs.

What is good about Western Digital's JBOD?


We are well aware that modular systems are limited in scalability by the capabilities of the controllers and that the network always creates delays. But at the same time, such systems have lower IOps, GBps and TB storage costs.

There are two things for which RAIDIX engineers loved Data102:

  1. JBOD not only allows you to place> 1 PB data per 4U. It is really very fast and on streaming operations it is not inferior to many all-flash solutions: 4U, 1PB, 23 GB / s are good indicators for a disk array.
  2. The Data102 is easy to maintain and requires no tools, like a screwdriver.

Our testing team hates screwdrivers that they already dream of at night. When they heard that HGST / WD is making a 102-disc monster, and presented how they would have to deal with 408 small cogs, strong alcohol ended in a nearby store.

In vain they were afraid. Taking care of the engineers, Western Digital has come up with a new way to mount the drive, which makes it easier to maintain. The discs are attached to the chassis using fixing clips, without bolts and screws. All discs are mechanically isolated using elastic fasteners on the rear panel. New firmware servo and accelerometers perfectly compensate for vibration.

What's in the box?


In the box - the body of the basket, filled with discs. You can buy at least 24 disks, and the solution is scaled by sets of 12 disks. This is done in order to ensure proper cooling and to deal with vibration in the best possible way.

By the way, the development of two assistive technologies - IsoVibe and ArcticFlow - made possible the birth of the new JBOD.

IsoVibe consists of the following components:

  1. Specialized drive firmware, which with the help of sensors controls the servos and predictively reduces the level of vibrations.
  2. Vibration insulated connectors on the back of the server (Fig. 1).
  3. And, of course, special mounting discs that do not require screws.


Fig. 1. Vibration insulated connectors

Temperature is the second factor after vibration that kills hard drives. At an average operating temperature above 55C, the mean time between the failures of the hard disk will be half the estimated one.

Bad cooling particularly affects servers with a large number of disks and large disk shelves. Often, the back rows of the discs are heated by more than 20 degrees more than the discs located near the cold corridor.

ArcticFlow is Western Digital's patented shelf cooling technology, the meaning of which is to create additional ducts inside the chassis that allow you to pull cold air to the back rows of disks directly from the cold corridor, bypassing the front rows.


Fig. 2. The principle of operation of ArcticFlow

A separate stream of cold air is built to cool the I / O modules and power supplies.

The result is a great thermal map of the operating shelf. The temperature range between the front and rear rows of disks is 10 degrees. The hottest disc is 49C at a temperature in the “cold” corridor + 35C. 1.6W is spent on cooling each disc - two times less than other similar chassis. Fans are quieter, vibration is less, drives live longer and work faster.


Fig. 3. Temperature card Ultrastar Data 102

Considering the 12W power budget per single disk, the shelf can easily be made hybrid — out of 102 disks, 24 can be SAS SSDs. They can be installed and used in a hybrid mode, or by setting up SAS Zoning and transferring it to a host in need of all-flash.

We also have a rack in the box. To install JBOD you need a couple of physically strong engineers. Here is what they will face:

.
JBOD mounts and cabling are designed in such a way that maintenance can be performed hot. Note also the vertical installation of an input / output module (IOM).

Let's take a look at this system. Front is simple and concise.


Fig. 4. Ultrastar Data 102. Front view

One of the most interesting features of JBOD is the installation of IO-modules on top!


Fig. 5. Ultrastar Data 102


Fig. 6. Ultrastar Data 102. Top view


Fig. 7. Ultrastar Data 102. Top view without disks

At the back, JBOD has 6 SAS 12G ports for each IO-module. Total we get 28,800 MBps of backend bandwidth. Ports can be used to connect to hosts, and partially for cascading. There are two ports for powering the system (80+ Platinum rated 1600W CRPS).


Fig. 8. Ultrastar Data 102. Rear View

Performance


As we said, Data102 is not just huge - it's fast! The results of tests conducted by the vendor are:

On 12 servers:
Sequential load

Random load

On 6 servers:
Sequential load

Random load


Fig. 9. Parallel load from 12 servers


Fig. 10. Parallel load from 6 servers

Control


There are two ways to manage JBOD from the software side:

  1. By ses
  2. By redfish

RedFish allows you to find components by lighting the LED, get information about the "health" of the components, as well as update the firmware.
By the way, the chassis supports T10 Power Disabling (Pin 3) to power off and reset individual drives.

This is useful if you have a disk that hangs the entire SAS bus.

Typical configurations


In order to use the capabilities of such a JBOD with maximum benefit, we will need RAID controllers or software. This is where RAIDIX software comes to the rescue.

To create a fault-tolerant storage system, we need two storage nodes and one or more baskets with SAS disks. If we do not want to implement protection against node failure or use data replication, then we can connect one server to the basket and use SATA disks.

Dual controller configuration


As controllers for RAIDIX-based storage systems, virtually any x86 server platform can be used: Supermicro, AIC, Dell, Lenovo, HPE, and many others. We are constantly working on certifying new equipment and porting our code to various architectures (for example, Elbrus and OpenPower).

For example, take the Supermicro platform and try to achieve the highest possible throughput and computational density. When “sizing” servers, we will use the PCI-E bus, where we will install the back-end and front-end controllers.

We also need controllers to connect a disk shelf, at least two AVAGO 9300-8e. Alternatively: a pair of 9400-8e or one 9405W-16e, but for the latter you need a full x16 slot.

The next component is a slot for the sync channel. This may be Infiniband or SAS. (For tasks where bandwidth and latency are not critical, you can get by synchronizing through the basket without a dedicated slot.)

And, of course, we will need slots for host interfaces, which must also be at least two.

Total: each controller needs to have from 5 x8 slots (without a margin for further scaling). To build low-cost systems targeting performance of 3-4 GB / s per node, we can get by with just two slots.

Controller Configuration Options


Supermicro 6029P-TRT
The controllers can be placed in two 2U 6029P-TRT servers. They are not the richest in terms of PCI-E slots, but they are equipped with a standard motherboard without raisers. These boards are guaranteed to “get” NVDIMM-N modules from Micron to protect the cache from power failures.

To connect the disks take Broadcom 9400 8e. Dirty cache segments will be synchronized via IB 100Gb.

Attention! The following configurations are designed for maximum performance and operation of all available options. For your specific task, the specification can be significantly reduced. Contact our partners.

The configuration of the system that we got:
NoNameDescriptionP / NQty per RAIDIX DC
onePlatformSuperServer 6029P-TRTSYS-6029P-TRT2
2CPUIntel Xeon Silver 4112 ProcessorIntel Xeon Silver 4112 Processorfour
3Memory16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1MEM-DR416L-CL06-ER2612
fourSystem diskSanDisk Extreme PRO 240GBSDSSDXPS-240G-G25four
fiveHot-swap 3.5 "to 2.5" SATA / SAS Drive TraysTool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab)MCP-220-00118-0Bfour
6HBA for cache-syncMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
7HBA for JBOD connectionBroadcom HBA 9400-8e Tri-Mode Storage Adapter05-50013-01four
eightEthernet patchcordEthernet patch cord for cache sync 0.5mone
9Cable for cache syncMellanox passive copper cable, VPI, EDR 1mMCP1600-E0012
tenHBA for host connectionMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
elevenSAS cableUltrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosureeight
12JbodUltrastar Data102one
13RAIDIXRAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks /RX46DSMMC-NALL-SQ0S-P5one

Here is an approximate diagram:


Fig. 11. Configuration based on Supermicro 6029P-TRT

Supermicro 2029BT-DNR
If we want to compete for space in the server room, then Supermicro Twin, for example, 2029BT-DNR, can be taken as the basis for storage controllers. These systems have 3 PCI-E slots and 1 IOM module each. Among the IOM is the Infiniband we need.

Configuration:
NoNameDescriptionP / NQty per RAIDIX DC
onePlatformSuperServer 2029BT-DNRSYS-2029BT-DNRone
2CPUIntel Xeon Silver 4112 ProcessorIntel Xeon Silver 4112 Processorfour
3Memory16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1MEM-DR416L-CL06-ER2612
fourSystem diskSupermicro SSD-DM032-PHISSD-DM032-PHI2
fiveHBA for cache-syncMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
6HBA for JBOD connectionBroadcom HBA 9405W-16e Tri-Mode Storage Adapter05-50044-002
7Ethernet patchcordEthernet patch cord for cache sync 0.5mone
eightCable for cache syncMellanox passive copper cable, VPI, EDR 1mMCP1600-E0012
9HBA for host connectionMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
tenSAS cableUltrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosureeight
elevenJbodUltrastar Data102one
12RAIDIXRAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disksRX46DSMMC-NALL-SQ0S-P5one

Here is an approximate diagram:


Fig. 12. Configuration based on Supermicro 2029BT-DNR

1U platform
Often there are tasks that require the maximum density of large amounts of data, but not required, for example, full fault tolerance for controllers. In this case, we take the 1U system as a basis and connect the maximum number of disk shelves to it.

Scale-out system


As a final exercise in our workout, we will build a horizontal-scalable system based on HyperFS. To begin with, we will select 2 types of controllers - for data storage and for metadata storage.

Storage controllers will be assigned to SuperMicro 6029P-TRT.

To store the metadata, we use several SSD drives in the basket, which we will combine into RAID and return MDC via SAN. For one storage system we can connect up to 4 JBOD cascade. Total in one deep rack place X PB data with a single namespace.
NoNameDescriptionP / NQty per RAIDIX DC
onePlatformSuperServer 6029P-TRTSYS-6029P-TRT2
2CPUIntel Xeon Silver 4112 ProcessorIntel Xeon Silver 4112 Processorfour
3Memory16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1MEM-DR416L-CL06-ER26sixteen
fourSystem diskSanDisk Extreme PRO 240GBSDSSDXPS-240G-G25four
fiveHot-swap 3.5 "to 2.5" SATA / SAS Drive TraysTool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab)MCP-220-00118-0Bfour
6HBA for cache-syncMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
7HBA for JBOD connectionBroadcom HBA 9400-8e Tri-Mode Storage Adapter05-50013-01four
eightEthernet patchcordEthernet patch cord for cache sync 0.5mone
9Cable for cache syncMellanox passive copper cable, VPI, EDR 1mMCP1600-E0012
tenHBA for host connectionMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
elevenSAS cableUltrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosureeight
12JbodUltrastar Data102one
13RAIDIXRAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks /RX46DSMMC-NALL-SQ0S-P5one
14Platform (MDC HyperFS)SuperServer 6028R-E1CR12LSSG-6028R-E1CR12Lone
15CPU (MDC HyperFS)Intel Xeon E5-2620v4 ProcessorIntel Xeon E5-2620v4 Processor2
sixteenMemory (MDC HyperFS)32GB DIM4 Crucial CT32G4RFD424A 32Gb DIMM ECC Reg PC4-19200 CL17 2400MHzCT32G4RFD424Afour
17System Disk (MDC HyperFS)SanDisk Extreme PRO 240GBSDSSDXPS-240G-G252
18Hot-swap 3.5 "to 2.5" SATA / SAS Drive Trays (MDC HyperFS)Tool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab)MCP-220-00118-0B2
nineteenHBA (MDC HyperFS)Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECATone

Here is an approximate wiring diagram:


Fig. 13. Scale-Out System Configuration

Conclusion


Working with large amounts of data, especially on write-intensive patterns, is a very difficult task for storage systems, the classic solution of which is to acquire shared-nothing scale-out systems. The new JBOD from Western Digital and software RAIDIX will allow you to build storage systems of several petabytes and several dozen GBps of performance much cheaper than using horizontal-scalable systems, and we advise you to pay your attention to this solution.

UPD


Added system specification with Micron's NVMDIMM-N:
NoNameDescriptionP / NQty per RAIDIX DC
onePlatformSuperServer 6029P-TRTSYS-6029P-TRT2
2CPUIntel Xeon Silver 4112 ProcessorIntel Xeon Silver 4112 Processorfour
3Memory16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1MEM-DR416L-CL06-ER2612
fourNVRAM16GB (x72, ECC, SR) 288-Pin DDR4 Nonvolatile RDIMM MTA18ASF2G72PF1ZMTA18ASF2G72PF1Z-2G6V21ABfour
fiveSystem diskSanDisk Extreme PRO 240GBSDSSDXPS-240G-G25four
6Hot-swap 3.5 "to 2.5" SATA / SAS Drive TraysTool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab)MCP-220-00118-0Bfour
7HBA for cache-syncMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
eightHBA for JBOD connectionBroadcom HBA 9400-8e Tri-Mode Storage Adapter05-50013-01four
9Ethernet patchcordEthernet patch cord for cache sync 0.5mone
tenCable for cache syncMellanox passive copper cable, VPI, EDR 1mMCP1600-E0012
elevenHBA for host connectionMellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16MCX456A-ECAT2
12SAS cableUltrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosureeight
13JbodUltrastar Data102one
14RAIDIXRAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks /RX46DSMMC-NALL-SQ0S-P5one

Source: https://habr.com/ru/post/354340/


All Articles