📜 ⬆️ ⬇️

How to build a scalable infrastructure on Storage Spaces

We often wrote about clusters in a box and Windows Storage Spaces and the most common question is how to choose the right disk configuration?

The traditional question usually looks like this:

- I want 100500 IOPS!
')
or:

- I want 20 terabytes.

Then it turns out that IOPS actually need not so much and you can get by with a few SSDs, but with 20 terabytes (4 disks in modern times) they want to do pretty well, and in reality it will turn out multi-level storage.

How to approach this correctly and plan ahead?

You need to consistently answer several key questions, for example:


The process of creating storage on Storage Spaces
Main technologies:


Advantages of Storage Spaces: A flexible and inexpensive data storage solution based entirely on mass hardware and Microsoft software.

The material assumes familiarity with the basics of Storage Spaces (available here and here ).

Steps:
one
Decision evaluation
Determine the key requirements for the solution, the characteristics of a successful installation.
2
SDS Design (Advance)
Selection of existing units (hardware and software) for the requirements, the choice of topology and configurations.
3
Test deployment
Check the selected solution in a real environment, possibly at a reduced scale (Proof-of-Concept, PoC).
four
Validation
The check for compliance with the requirements of paragraph 1. Usually begins with a synthetic load (SQLIO, Iometer, etc.), the final check is carried out in a real environment.
five
Optimization
According to the results of the previous steps and the revealed difficulties, adjustment and optimization of the solution (add / remove / replace hardware blocks, modify the topology, reconfigure the software, etc.), return to step 4
6
Deployment (in a real environment)
After optimization of the initial decision to the formulated requirements, there comes a period of implementation of the final version in the real operating environment.
7
The working process
Phase of operation. Monitoring, troubleshooting, upgrading, scaling, etc.


The first step was described above.

Defining Software Defined Storage Architecture (Advance)

Highlights:


General principles

You must install all updates and patches (as well as firmware).

Try to make the configuration symmetrical and complete.

Consider possible failures and create a plan to eliminate the consequences.


Why firmware is important

Key Requirements: Tiring

Multi-level storage (tearing) significantly increases the overall performance of the storage system. However, SSD drives have a relatively small capacity and a high price per gigabyte and somewhat increase the complexity of managing the storage.

Usually, the tearing is still used, and the “hot” data is placed on the productive level of SSD disks.



Key requirements: type and number of hard drives

The type, capacity and number of disks should reflect the desired capacity of the storage system. SAS and NL-SAS drives are supported, but expensive SAS 10 / 15K drives are not often needed when using tiring with a storage level on an SSD.

A typical choice *: 2-4 TB NL-SAS, model from one manufacturer with the latest stable firmware version.

* Typical values ​​reflect current recommendations for working with virtualized tasks. Other types of load require additional work.

Capacity calculation:



Where:
a) Number of hard drives.
b) Number of disks - the nearest nearest even value to the resulting number.
c) Typical data block size (in gigabytes).
d) Reserve for capacity expansion.
e) Data integrity loss. Two-way mirror, three-way mirror, parity and other array types offer a different margin of "durability".
f) Each disk in Storage Spaces stores a copy of metadata (about pools, virtual disks) +, if necessary, provide a reserve for fast rebuild technology.
g) Disk capacity in gigabytes.

Performance calculation:



Where:
a) Number of hard drives.
b) Number of disks - the nearest nearest even value to the resulting number.
c) The number of IOPS that is planned to be obtained from the level of hard drives.
d) Record percentage.
e) Penalty on recording, which is different for different array levels used, which must be taken into account (for the system as a whole, for example, for 2-way mirror 1, the write operation causes 2 recordings on the disks, for 3-way mirror there are already 3 records).
f) Estimate disk performance for writing.
g) Evaluation of disk performance for reading.

* Calculations are given for the starting point of system planning .

Key requirements: type and number of SSD

With the hard drives sorted out, it's time to count the SSD. Their type, quantity and volume is based on the desired maximum performance of the disk subsystem.

Increasing the capacity of the SSD level allows you to move more tasks to the fast level using the integrated tiering engine.

Since the number of columns (Columns, data parallelization technology) in a virtual disk should be the same for both levels, increasing the number of SSDs usually allows you to create more columns and increase HDD level performance.

A typical choice: 200-1600 GB on MLC-memory, model from one manufacturer with the latest stable firmware version.

Performance calculation *:



Where:
a) The number of SSD drives.
b) SSD number - the nearest nearest even value to the resulting number.
c) The number of IOPS that you plan to receive from the SSD level.
d) Record percentage.
e) The penalty for writing is different for different levels of the array, it is necessary to take into account when planning.
f) Estimate the performance of SSD write.
g) Estimation of SSD performance for reading.

* Only for a starting point. Usually, the number of SSDs significantly exceeds the required minimum for performance due to additional factors.

The recommended minimum number of SSD in the shelves:
Array type
24 Disk JBOD (2 Columns)
60 disk JBOD (4 columns)
Two-way mirror
four
eight
Three-way mirror
6
12


Key requirements: SSD: HDD ratio

Matching is the balance of performance, capacity and cost. Adding SSD improves performance (most operations occur at the SSD level, increasing the number of columns in virtual disks, etc.), but significantly increases the cost and reduces the potential capacity (instead of a capacious disk, a small SSD is placed).

Typical choices: SSD: HDD *: 1: 4 - 1: 6

* By quantity, not capacity.



Key Requirements: Number and Configuration of Disk Shelves (JBOD)

Disk shelves differ in many ways - the number of disks to be installed, the number of SAS ports, etc.

Using multiple shelves allows you to consider their availability in fault tolerance schemes (using the enclosure awareness feature), but also increases the disk space required for Fast Rebuild.

The configuration should be symmetrical in all shelves (meaning cable connection and location of disks).

A typical choice: the number of shelves> = 2, IO modules in the shelf - 2, a single model, the latest firmware versions and the symmetrical arrangement of the disks in all the shelves.



Usually, the number of disk shelves is selected from the total number of disks, taking into account the margin for expansion and fault tolerance, taking into account the shelves (using the enclosure awareness function) and / or adding SAS paths for throughput and fault tolerance.

Payment:



Where:
a) Amount of JBOD.
b) Rounded up.
c) The number of HDD.
d) SSD count.
e) Free drive slots.
f) Maximum number of disks in JBOD.

Key requirements: SAS HBA and cables

The SAS cable topology must connect each storage server to the JBOD shelves using at least one SAS path (that is, one SAS port in the server falls on one SAS port in each JBOD).

Depending on the number and type of disks in the JBOD shelves, the total performance can easily reach the limit of one 4x 6G SAS port (~ 2.2 GB / s).

Recommended:


Key requirements: storage server configuration

Here it is necessary to take into account such features as server performance, number in a cluster for fault tolerance and serviceability, load characteristics (total IOPS, throughput, etc.), use of unloading technologies (RDMA), number of SAS ports per JBOD and multipath requirements .

2-4 servers are recommended; 2 x processor 6+ cores; memory> = 64 GB; two local hard drives in the mirror; two ports 1+ GbE for control and two ports 10+ GbE RDMA for data exchange; BMC port is either dedicated or combined with 1GbE and supports IPMI 2.0 and / or SMASH; SAS HBA from the previous paragraph.

Key requirements: number of disk pools

At this step, we take into account that pools are units for both management and fault tolerance. A failed disk in the pool affects all virtual disks located on the pool, each disk in the pool contains metadata.

Increase the number of pools:


Typical choice:


Key requirements: pool configuration

The pool contains default data for the corresponding VDs and several settings that affect the storage behavior:

Option
Description
RepairPolicy
Sequential vs Parallel (respectively, lower load on the IO, but slowly, or high load on the IO, but quickly)
RetireMissingPhysicalDisks
With the Fast Rebuild option, the dropped out disks do not cause the pool to be restored if the Auto value is set (if the value is Always, the pool rebuilding is always started using the Spare disk, but not immediately, but 5 minutes after the first unsuccessful attempt to write to this disk).
IsPowerProtected
True means that all write operations are considered complete without confirmation from the disk. A power outage can cause data corruption if there is no protection (a similar technology has appeared on hard drives).


Typical choice:


Key requirements: number of virtual disks (VD)

Take into account the following features:




A typical choice of VD numbers: 2-4 per storage server.

Key requirements: virtual disk configuration (VD)

Three types of array are available - Simply, Mirror and Parity, but only Mirror is recommended for virtualization tasks.

3-way mirroring, compared to 2-way, doubles protection against disk failure at the cost of a slight decrease in performance (higher recording penalty), a decrease in available space and a corresponding increase in cost.



Comparison of types of mirroring:
Number of pools
Mirror type
Overhead
Pool stability
System stability
one
2-way
50%
1 disc
1 disc
3-way
67%
2 disks
2 disks
2
2-way
50%
1 disc
2 disks
3-way
67%
2 disks
4 discs
3
2-way
50%
1 disc
3 disks
3-way
67%
2 disks
6 discs
four
2-way
50%
1 disc
4 discs
3-way
67%
2 disks
8 discs


Microsoft for Cloud Platform System (CPS) recommends a 3-way mirror.

Key requirements: number of columns

Usually, an increase in the number of columns increases the performance of the VD, but delays may also increase (more columns - more disks that should confirm the operation). The formula for calculating the maximum number of columns in a particular mirrored VD:



Where:
a) Rounding down.
b) The number of disks in a smaller level (usually the number of SSDs).
c) Fault tolerance by disk. For 2-way mirror it is 1 disk, for 3-way mirror it is 2 disks.
d) The number of copies of the data. For a 2-way mirror it is 2, for a 3-way mirror it is 3.

If the maximum number of columns is selected and a disk failure has occurred, then the pool does not have the required number of disks to match the number of columns and, accordingly, fast rebuild will be unavailable.

The usual choice is 4-6 (1 less than the calculated maximum for Fast Rebuild operation). This feature is only available in PowerShell; when building through a graphical interface, the system will choose the maximum possible number of columns, up to 8.

Key Requirements: Virtual Disk Options

Virtual Disk option
Considerations
Interleave
For random loads (virtualized, for example), the block size must be greater than or equal to the largest active request, since any query larger than this will be split into several operations, which will reduce performance.
WBC cache size
The default value is 1 GB, which is a reasonable balance between performance and fault tolerance for most tasks (increasing the cache size increases the transfer time in case of failures; when CSV volume is transferred, a reset and restore procedure is required, as well as problems due to denial of service) .
IsEnclosureAware
Improved protection against failures, if possible - it is recommended to use. For activation, it is necessary to match the number of disk shelves to the array level (available only through PowerShell). For a 3-way mirror you need 3 shelves, for dual parity 4, etc. This feature allows you to experience a complete loss of the disk shelf.


Typical choice:

Interleave: 256K (default)
WBC Size: 1GB (default)
IsEnclosureAware: use whenever possible.

Key requirements: virtual disk size

Obtaining the optimal VD size on the pool requires calculations for each storage level and summing up the optimal sizes of the VD levels. It is also necessary to reserve space for Fast Rebuild and metadata, as well as for internal rounding during calculations (buried in the depths of the Storage Spaces stack, so we simply reserve).

Formula:



Where:
a) A conservative approach, leaves a few more unallocated space in the pool than is necessary for Fast Rebuild to work. The value in GiB (the power of 2, GB is a derivative of a power of 10).
b) Value in GiB.
c) Backup Space for Storage Spaces metadata (all disks in the pool contain metadata from both pool and virtual disks).
d) Backup space for Fast Rebuild:> = size of one disk (+ 8GiB) for each level in the pool in the disk shelf.
e) The size of the storage level and so rounded up into several “tiles” (Spaces splits each disk in a pool into pieces called “tiles” that are used to build an array), the tile size is equal to the size of the Storage Spaces unit (1GiB) multiplied by number of columns; therefore, we round the size down to the nearest whole so as not to highlight the excess.
f) Cache size per write, in GiB, for the desired level (1 for SSD level, 0 for HDD level).
g) The number of disks in a particular pool level.

Spaces-Based SDS: Next Steps



We decided on the initial estimates, move on. Changes and improvements (and they will appear) in the results we will make on the basis of testing.

Example!

What do you want to get?


Equipment:

2/4/6 / 8TB drives
IOPS (R / W): 140/130 R / W IOPS (declared specifications: 175 MB / s, 4.16ms)

SSD with 200/400/800 / 1600GB capacity
Iops
Read: 7000 IOPS @ 460MB / s (stated: 120K)
Write: 5500 IOPS @ 360MB / s (40K stated)

SSDs in real-world tasks show significantly different figures from the declared ones :)

Disk shelves - 60 disks, two SAS modules, 4 ports on each ETegro Fastor JS300 .

Input data

Tiering is required because high performance is required.

High reliability and not very big capacity is required, we use 3-way mirroring.

Drive selection (based on performance and budget requirements): 4TB HDD and 800GB MLC SSD

Calculations:

Spindles by capacity :



Performance spindles (we want to get 10K IOPS) :



The total capacity significantly exceeds the initial estimates, as it is necessary to meet the performance requirements.

SSD level:



The ratio of SSD: HDD:



Number of disk shelves:



The location of the disks in the shelves:

Increase the number of disks to get a symmetrical distribution and an optimal SSD: HDD ratio. For ease of expansion it is also recommended to fill the disk shelves completely.

SSD: 32 -> 36
HDD: 136 -> 144 (SSD: HDD 1: 4)
SSD / Enclosure: 12
HDD / Enclosure: 48

SAS cables

For a fault-tolerant connection and maximizing bandwidth, you need to connect each disk shelf with two cables (two ways from the server to each shelf). Total 6 SAS ports per server.

Number of servers

Based on the requirements for fault tolerance, IO, budget and the requirements of the multipath organization, we take 3 servers.

Number of pools

The number of disks in the pool must be equal to or less than 80; we take 3 pools (180/80 = 2.25).

HDDs / Pool: 48
SSDs / Pool: 12

Pool configuration

Hot Spares: No
Fast Rebuild: Yes (reserve enough space)
RepairPolicy: Parallel (default)
RetireMissingPhysicalDisks: Always
IsPowerProtected: False (default)

Number of virtual disks

Based on the requirements, we use 2 VD for each server, for a total of 6 equally distributed in pools (2 per pool).

Virtual Disk Configuration

Based on the requirements for resistance to data loss (for example, it is not possible to replace a failed disk within a few days) and load, we use the following settings:

Resiliency: 3-way mirroring
Interleave: 256K (default)
WBC Size: 1GB (default)
IsEnclosureAware: $ true

Number of columns



Virtual disk size and storage levels



Total:

Storage Servers: 3
SAS ports / server: 6
SAS paths from the server to each shelf: 2
Disk shelves: 3
Number of pools: 3
Number of VD: 6
Virtual Disks / Pool: 2
HDD: 144 @ 4TB (~ 576TB of raw space), 48 / shelf, 48 / pool, 16 / shelf / pool
SSD: 36 @ 800GB (~ 28TB of raw space), 12 / shelf, 12 / pool, 4 / shelf / pool
Size of Virtual Disk: SSD Tier + HDD Tier = 1110GB + 27926GB = 28.4TB
General useful capacity: (28.4) * 6 = 170TB
Overhead: (1 - 170 / (576 + 28)) = 72%

Source: https://habr.com/ru/post/257089/


All Articles