
What are great jbods good for?
The new JBOD Western Digital on a disk of 102TB turned out to be powerful. In developing this JBOD, previous experience with two generations of 60 disk shelves was taken into account.
Data102 turned out to be rare for such giants balanced in volume and performance.
Why do we need such large disc baskets when the popularity of hyper-convergent systems grows in the world?
Tasks in which the requirements for storage volumes significantly exceed the requirements for computing power, can inflate the customer's budget to incredible sizes. Here are just a few examples and scenarios:
')
- Replication Factor 2 or 3, used in the construction of Scale-out systems, on several petabytes of data is an expensive solution.
- Intensive sequential read / write operations cause the cluster node to go beyond the local storage, which can lead to problems such as long-tail latency. In this case, you should be extremely careful when building a network.
- Distributed systems do an excellent job with tasks like “many applications work with many of their files” and mediocre with writing and reading from a strongly connected cluster, especially in the N-to-1 mode.
- With tasks like “increase the depth of the video archive by 2 times” it is much cheaper to throw a big JBOD than to increase the number of servers in the cluster by 2 times.
- Using external storage systems with JBOD, we can clearly allocate volume and performance for our priority applications by reserving certain disks, caches, ports for them, while maintaining the necessary level of flexibility and scalability.
As a rule, disk shelves of the Data102 level are developed by disk manufacturers who understand well how to work with these disks and know all the pitfalls. In such devices, everything is fine with the level of vibration and cooling, and the power consumption corresponds to the real data storage needs.
What is good about Western Digital's JBOD?
We are well aware that modular systems are limited in scalability by the capabilities of the controllers and that the network always creates delays. But at the same time, such systems have lower IOps, GBps and TB storage costs.
There are two things for which RAIDIX engineers loved Data102:
- JBOD not only allows you to place> 1 PB data per 4U. It is really very fast and on streaming operations it is not inferior to many all-flash solutions: 4U, 1PB, 23 GB / s are good indicators for a disk array.
- The Data102 is easy to maintain and requires no tools, like a screwdriver.
Our testing team hates screwdrivers that they already dream of at night. When they heard that HGST / WD is making a 102-disc monster, and presented how they would have to deal with 408 small cogs, strong alcohol ended in a nearby store.
In vain they were afraid. Taking care of the engineers, Western Digital has come up with a new way to mount the drive, which makes it easier to maintain. The discs are attached to the chassis using fixing clips, without bolts and screws. All discs are mechanically isolated using elastic fasteners on the rear panel. New firmware servo and accelerometers perfectly compensate for vibration.
What's in the box?
In the box - the body of the basket, filled with discs. You can buy at least 24 disks, and the solution is scaled by sets of 12 disks. This is done in order to ensure proper cooling and to deal with vibration in the best possible way.
By the way, the development of two assistive technologies - IsoVibe and ArcticFlow - made possible the birth of the new JBOD.
IsoVibe consists of the following components:
- Specialized drive firmware, which with the help of sensors controls the servos and predictively reduces the level of vibrations.
- Vibration insulated connectors on the back of the server (Fig. 1).
- And, of course, special mounting discs that do not require screws.
Fig. 1. Vibration insulated connectorsTemperature is the second factor after vibration that kills hard drives. At an average operating temperature above 55C, the mean time between the failures of the hard disk will be half the estimated one.
Bad cooling particularly affects servers with a large number of disks and large disk shelves. Often, the back rows of the discs are heated by more than 20 degrees more than the discs located near the cold corridor.
ArcticFlow is Western Digital's patented shelf cooling technology, the meaning of which is to create additional ducts inside the chassis that allow you to pull cold air to the back rows of disks directly from the cold corridor, bypassing the front rows.
Fig. 2. The principle of operation of ArcticFlowA separate stream of cold air is built to cool the I / O modules and power supplies.
The result is a great thermal map of the operating shelf. The temperature range between the front and rear rows of disks is 10 degrees. The hottest disc is 49C at a temperature in the “cold” corridor + 35C. 1.6W is spent on cooling each disc - two times less than other similar chassis. Fans are quieter, vibration is less, drives live longer and work faster.
Fig. 3. Temperature card Ultrastar Data 102Considering the 12W power budget per single disk, the shelf can easily be made hybrid — out of 102 disks, 24 can be SAS SSDs. They can be installed and used in a hybrid mode, or by setting up SAS Zoning and transferring it to a host in need of all-flash.
We also have a rack in the box. To install JBOD you need a couple of physically strong engineers. Here is what they will face:
- Shelf assembly weighs 120 kg, and without discs - 32 kg
- The deep stand in this case starts from 1200 mm
- Well, add the SAS and power cables
.
JBOD mounts and cabling are designed in such a way that maintenance can be performed hot. Note also the vertical installation of an input / output module (IOM).
Let's take a look at this system. Front is simple and concise.
Fig. 4. Ultrastar Data 102. Front viewOne of the most interesting features of JBOD is the installation of IO-modules on top!
Fig. 5. Ultrastar Data 102
Fig. 6. Ultrastar Data 102. Top view
Fig. 7. Ultrastar Data 102. Top view without disksAt the back, JBOD has 6 SAS 12G ports for each IO-module. Total we get 28,800 MBps of backend bandwidth. Ports can be used to connect to hosts, and partially for cascading. There are two ports for powering the system (80+ Platinum rated 1600W CRPS).
Fig. 8. Ultrastar Data 102. Rear ViewPerformance
As we said, Data102 is not just huge - it's fast! The results of tests conducted by the vendor are:
On 12 servers:Sequential load- Reading = 24.2GB / s max. @ 1MB (237 MB / s per HDD max.)
- Record = 23.9GB / s max. @ 1MB (234 MB / s per HDD max.)
Random load- 4kB read with queue depth = 128:> 26k IOps
- 4kB write with 1–128 queue depth:> 45k IOps
On 6 servers:Sequential load- Reading = 22.7GB / s max. @ 1MB (223 MB / s per HDD max.)
- Record = 22.0GB / s max. @ 1MB (216 MB / s per HDD max.)
Random load- 4kB read with queue depth = 128:> 26k IOps
- Entry with queue depth = 1–128:> 45k IOps
Fig. 9. Parallel load from 12 servers
Fig. 10. Parallel load from 6 serversControl
There are two ways to manage JBOD from the software side:
- By ses
- By redfish
RedFish allows you to find components by lighting the LED, get information about the "health" of the components, as well as update the firmware.
By the way, the chassis supports T10 Power Disabling (Pin 3) to power off and reset individual drives.
This is useful if you have a disk that hangs the entire SAS bus.
Typical configurations
In order to use the capabilities of such a JBOD with maximum benefit, we will need RAID controllers or software. This is where RAIDIX software comes to the rescue.
To create a fault-tolerant storage system, we need two storage nodes and one or more baskets with SAS disks. If we do not want to implement protection against node failure or use data replication, then we can connect one server to the basket and use SATA disks.
Dual controller configuration
As controllers for RAIDIX-based storage systems, virtually any x86 server platform can be used: Supermicro, AIC, Dell, Lenovo, HPE, and many others. We are constantly working on certifying new equipment and porting our code to various architectures (for example, Elbrus and OpenPower).
For example, take the Supermicro platform and try to achieve the highest possible throughput and computational density. When “sizing” servers, we will use the PCI-E bus, where we will install the back-end and front-end controllers.
We also need controllers to connect a disk shelf, at least two AVAGO 9300-8e. Alternatively: a pair of 9400-8e or one 9405W-16e, but for the latter you need a full x16 slot.
The next component is a slot for the sync channel. This may be Infiniband or SAS. (For tasks where bandwidth and latency are not critical, you can get by synchronizing through the basket without a dedicated slot.)
And, of course, we will need slots for host interfaces, which must also be at least two.
Total: each controller needs to have from 5 x8 slots (without a margin for further scaling). To build low-cost systems targeting performance of 3-4 GB / s per node, we can get by with just two slots.
Controller Configuration Options
Supermicro 6029P-TRTThe controllers can be placed in two 2U 6029P-TRT servers. They are not the richest in terms of PCI-E slots, but they are equipped with a standard motherboard without raisers. These boards are guaranteed to “get” NVDIMM-N modules from Micron to protect the cache from power failures.
To connect the disks take Broadcom 9400 8e. Dirty cache segments will be synchronized via IB 100Gb.
Attention! The following configurations are designed for maximum performance and operation of all available options. For your specific task, the specification can be significantly reduced. Contact our partners.
The configuration of the system that we got:
No | Name | Description | P / N | Qty per RAIDIX DC |
---|
one | Platform | SuperServer 6029P-TRT | SYS-6029P-TRT | 2 |
2 | CPU | Intel Xeon Silver 4112 Processor | Intel Xeon Silver 4112 Processor | four |
3 | Memory | 16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1 | MEM-DR416L-CL06-ER26 | 12 |
four | System disk | SanDisk Extreme PRO 240GB | SDSSDXPS-240G-G25 | four |
five | Hot-swap 3.5 "to 2.5" SATA / SAS Drive Trays | Tool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab) | MCP-220-00118-0B | four |
6 | HBA for cache-sync | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
7 | HBA for JBOD connection | Broadcom HBA 9400-8e Tri-Mode Storage Adapter | 05-50013-01 | four |
eight | Ethernet patchcord | Ethernet patch cord for cache sync 0.5m | | one |
9 | Cable for cache sync | Mellanox passive copper cable, VPI, EDR 1m | MCP1600-E001 | 2 |
ten | HBA for host connection | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
eleven | SAS cable | Ultrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosure | | eight |
12 | Jbod | Ultrastar Data102 | | one |
13 | RAIDIX | RAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks / | RX46DSMMC-NALL-SQ0S-P5 | one |
Here is an approximate diagram:
Fig. 11. Configuration based on Supermicro 6029P-TRTSupermicro 2029BT-DNRIf we want to compete for space in the server room, then Supermicro Twin, for example, 2029BT-DNR, can be taken as the basis for storage controllers. These systems have 3 PCI-E slots and 1 IOM module each. Among the IOM is the Infiniband we need.
Configuration:
No | Name | Description | P / N | Qty per RAIDIX DC |
---|
one | Platform | SuperServer 2029BT-DNR | SYS-2029BT-DNR | one |
2 | CPU | Intel Xeon Silver 4112 Processor | Intel Xeon Silver 4112 Processor | four |
3 | Memory | 16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1 | MEM-DR416L-CL06-ER26 | 12 |
four | System disk | Supermicro SSD-DM032-PHI | SSD-DM032-PHI | 2 |
five | HBA for cache-sync | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
6 | HBA for JBOD connection | Broadcom HBA 9405W-16e Tri-Mode Storage Adapter | 05-50044-00 | 2 |
7 | Ethernet patchcord | Ethernet patch cord for cache sync 0.5m | | one |
eight | Cable for cache sync | Mellanox passive copper cable, VPI, EDR 1m | MCP1600-E001 | 2 |
9 | HBA for host connection | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
ten | SAS cable | Ultrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosure | | eight |
eleven | Jbod | Ultrastar Data102 | | one |
12 | RAIDIX | RAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks | RX46DSMMC-NALL-SQ0S-P5 | one |
Here is an approximate diagram:
Fig. 12. Configuration based on Supermicro 2029BT-DNR1U platformOften there are tasks that require the maximum density of large amounts of data, but not required, for example, full fault tolerance for controllers. In this case, we take the 1U system as a basis and connect the maximum number of disk shelves to it.
Scale-out system
As a final exercise in our workout, we will build a horizontal-scalable system based on HyperFS. To begin with, we will select 2 types of controllers - for data storage and for metadata storage.
Storage controllers will be assigned to SuperMicro 6029P-TRT.
To store the metadata, we use several SSD drives in the basket, which we will combine into RAID and return MDC via SAN. For one storage system we can connect up to 4 JBOD cascade. Total in one deep rack place X PB data with a single namespace.
No | Name | Description | P / N | Qty per RAIDIX DC |
---|
one | Platform | SuperServer 6029P-TRT | SYS-6029P-TRT | 2 |
2 | CPU | Intel Xeon Silver 4112 Processor | Intel Xeon Silver 4112 Processor | four |
3 | Memory | 16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1 | MEM-DR416L-CL06-ER26 | sixteen |
four | System disk | SanDisk Extreme PRO 240GB | SDSSDXPS-240G-G25 | four |
five | Hot-swap 3.5 "to 2.5" SATA / SAS Drive Trays | Tool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab) | MCP-220-00118-0B | four |
6 | HBA for cache-sync | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
7 | HBA for JBOD connection | Broadcom HBA 9400-8e Tri-Mode Storage Adapter | 05-50013-01 | four |
eight | Ethernet patchcord | Ethernet patch cord for cache sync 0.5m | | one |
9 | Cable for cache sync | Mellanox passive copper cable, VPI, EDR 1m | MCP1600-E001 | 2 |
ten | HBA for host connection | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
eleven | SAS cable | Ultrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosure | | eight |
12 | Jbod | Ultrastar Data102 | | one |
13 | RAIDIX | RAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks / | RX46DSMMC-NALL-SQ0S-P5 | one |
14 | Platform (MDC HyperFS) | SuperServer 6028R-E1CR12L | SSG-6028R-E1CR12L | one |
15 | CPU (MDC HyperFS) | Intel Xeon E5-2620v4 Processor | Intel Xeon E5-2620v4 Processor | 2 |
sixteen | Memory (MDC HyperFS) | 32GB DIM4 Crucial CT32G4RFD424A 32Gb DIMM ECC Reg PC4-19200 CL17 2400MHz | CT32G4RFD424A | four |
17 | System Disk (MDC HyperFS) | SanDisk Extreme PRO 240GB | SDSSDXPS-240G-G25 | 2 |
18 | Hot-swap 3.5 "to 2.5" SATA / SAS Drive Trays (MDC HyperFS) | Tool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab) | MCP-220-00118-0B | 2 |
nineteen | HBA (MDC HyperFS) | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | one |
Here is an approximate wiring diagram:
Fig. 13. Scale-Out System ConfigurationConclusion
Working with large amounts of data, especially on write-intensive patterns, is a very difficult task for storage systems, the classic solution of which is to acquire shared-nothing scale-out systems. The new JBOD from Western Digital and software RAIDIX will allow you to build storage systems of several petabytes and several dozen GBps of performance much cheaper than using horizontal-scalable systems, and we advise you to pay your attention to this solution.
UPD
Added system specification with Micron's NVMDIMM-N:
No | Name | Description | P / N | Qty per RAIDIX DC |
---|
one | Platform | SuperServer 6029P-TRT | SYS-6029P-TRT | 2 |
2 | CPU | Intel Xeon Silver 4112 Processor | Intel Xeon Silver 4112 Processor | four |
3 | Memory | 16GB PC4-21300 2666MHz DDR4 ECC Registered DIMM Micron MTA36ASF472PZ-2G6D1 | MEM-DR416L-CL06-ER26 | 12 |
four | NVRAM | 16GB (x72, ECC, SR) 288-Pin DDR4 Nonvolatile RDIMM MTA18ASF2G72PF1Z | MTA18ASF2G72PF1Z-2G6V21AB | four |
five | System disk | SanDisk Extreme PRO 240GB | SDSSDXPS-240G-G25 | four |
6 | Hot-swap 3.5 "to 2.5" SATA / SAS Drive Trays | Tool-less black hot-swap 3.5-to-2.5 ​​converter HDD drive tray (Red tab) | MCP-220-00118-0B | four |
7 | HBA for cache-sync | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
eight | HBA for JBOD connection | Broadcom HBA 9400-8e Tri-Mode Storage Adapter | 05-50013-01 | four |
9 | Ethernet patchcord | Ethernet patch cord for cache sync 0.5m | | one |
ten | Cable for cache sync | Mellanox passive copper cable, VPI, EDR 1m | MCP1600-E001 | 2 |
eleven | HBA for host connection | Mellanox ConnectX-4 VPI adapter card, EDR IB (100Gb / s), dual-port QSFP28, PCIe3.0 x16 | MCX456A-ECAT | 2 |
12 | SAS cable | Ultrastar Data102 Cable IO HD mini-SAS to HD mini-SAS 2m 2Pack storage enclosure | | eight |
13 | Jbod | Ultrastar Data102 | | one |
14 | RAIDIX | RAIDIX 4.6 DC / NAS / iSCSI / FC / SAS / IB / SSD-cache / QoSmic / SanOpt / Extended 5 years support / unlimited disks / | RX46DSMMC-NALL-SQ0S-P5 | one |