📜 ⬆️ ⬇️

Testing flash storage. Theoretical part


We started testing flash arrays at the request of one of our major customers who could not decide on a storage system solution that would solve their problems. However, the topic turned out to be so relevant and interesting that it soon went beyond the limits of one specific project. Over time, our own methodology was worked out, scripts were written and unique factual material collected. I wanted to share it with my colleagues. Honestly, without unnecessary enthusiasm and myths, just facts. This article will open a series of independent publications, each of which will be devoted to testing a particular array or related technology. However, first we will have to say a few words about how SSD drives differ from ordinary hard drives (HDD) and what features, as a result, appear when testing storage systems based on them.

I apologize in advance for common truths. Winchester (HDD) is a motor, plates, heads and a controller. When reading / writing, the disk controller moves the heads to the desired track, waits for the disk to turn the correct sector and reads / writes data. With this algorithm, performance is directly dependent on the speed of rotation of the spindle and the speed of movement of the heads. Both have mechanical and electromechanical limitations. Significant improvement in these indicators has not been observed for more than a decade (disks with a spindle speed of 15,000 rev / min appeared about 12 years ago).

What is usually measured on hard drives?

1. IOPS (the number of I / O operations per second) and Latency (response time) are measured with random (random) load in small blocks. The number of IOPS issued by HDDs depends little on:

2. Bandwidth when streaming input output. Indicators are weakly dependent on the type of load, but significantly depend on the position of the heads relative to the center of the disk (Zone Bit Recording)
Note that the speed of the HDD does not depend on the load history, that is, we get the same IOPS on the same load, at the beginning of the test and at the end. HDD with the same spindle speed from different manufacturers, as a rule, practically do not differ in performance - the mechanics are about the same, and the controller has long ceased to be a factor limiting performance.

Now, back to the SSD drives (not necessarily in a disk form factor). A standard SSD consists of a controller (s) and a set of memory chips. Memory chips consist of (very simplistic) blocks (usually 4K) organized into pages. Data is always recorded in free space, sequentially filling in free pages, regardless of whether it is new data or a change to existing ones. Copies of modified data blocks are not erased, but only marked as obsolete. Deletion of “obsolete” copies of data blocks on an SSD is handled by a special process, the Garbage Collection (GC), which (generally) performs the following operations:

Usually, the Garbage Collection is performed in the background while the system is not loaded, but with a long write load, this process severely limits the performance of the SSD, since The speed of the Garbage Collection process is noticeably lower than the peak speed of the SSD to write. The phenomenon of a drop in SSD performance during long-term write loads is called Write Cliff
')
Manufacturers of SSD drives are trying to level the impact of the Garbage Collection process through:

SSD performance depends very much on the type of memory chips, the way they are used, the controllers used on the disk, the I / O interface and, unlike conventional HDDs, where, in fact, all drives with the same spindle speed have comparable performance, different SSD drives may vary in performance at times.

What is usually measured on flash drives and flash arrays ?

1. IOPS and latency at an arbitrary (random) load. Unlike HDD, there is a dependence on the block size and type of load, that is, we write or read. Accordingly, in the case of SSDs, it is necessary to make groups of tests with a change in the ratio of the number of read operations to write operations, with varying block size.

2. Changing the performance of SSD during long write operations to determine:

It is useless to measure bandwidth when streaming input output, since the SSD architecture implies data fragmentation.

What is important is that after each test to write a disk array defining peak performance, it is necessary to pause to level the influence of the Garbage Collection processes.
The architecture of flash arrays and the optimization of their controllers for SSD use plays a very important role in determining the performance of the entire array. With a peak performance of one SSD drive of 50,000 IOPS, the performance of the disk array controller can be a limiting factor. This often happens when individual manufacturers attempt to make a flash array from the usual one through installing SSD disks into it. In addition, the array controller significantly adds latency, which previously was not noticeable on HDD systems:

Not optimized controller can significantly reduce the characteristics of the SSD drive used.

Another important point: the storage SSD is potentially capable of issuing millions of IOPS. When testing, the load generator itself can be a limiting factor, therefore the configuration of the server (s) generating the load must take into account the peculiarities of SSD. It is necessary to properly configure the schedulers, the size of the queues I / O, etc., as much as possible parallelize the test: it is unlikely to get the figures declared by the manufacturer on one LUN for the entire volume of the disk array.

On this, I believe, theory is enough - it's time to move on to practice. Read the following article: Testing Storage IBM RamSan FlashSystem 820 .

image



PS The author expresses cordial thanks to Pavel Katasonov, Yuri Rakitin and all other company employees who participated in the preparation of this material.

Source: https://habr.com/ru/post/227885/


All Articles