Earlier we already talked about the
implementation of QoS technology in Dell Compellent systems , but this is not the end of the innovations in SCOS version 7 and today a few words about them.
In Compellent systems, multi-level storage technology (Data Progression) has appeared quite a long time ago, when for many storage systems the midrange class did not even talk about it. As then, the movement of data blocks between the levels now also occurs on a schedule, rather than in the “online” mode. This allows you to control the load on the system, avoiding degradation of performance due to migration processes being performed. Although, take into account the activity of the load in advance when planning. Data organization in Compellent systems is based on data pages ranging in size from 512KB to 4MB (the default page size is 2MB). Migration is also carried out at the level of individual pages. Due to this, data can be more selectively distributed across storage levels. The recording of new data always takes place at the fastest level (“fast” disks in RAID10 or ssd) and this also needs to be taken into account during initial planning.
Starting with SCOS version 6.5.1, the ability to compress data has been added. It also occurs offline, in parallel with the redistribution of pages across storage levels. To explain the principle of Data Reduction in the Dell Compellent storage system, it is necessary to say that there are three types of data pages used by the system:
● “Active” (active pages) - recently recorded data. These pages are always at the fastest level of storage and the data in them can be both read and overwritten.
● “Frozen accessible” - the pages after creating the snapshot. The data in them has not yet been overwritten, so they are available for reading, but as soon as there is an attempt to change them, a new active page will be created, and this page will be switched to “frozen inaccessible” mode.
● “Frozen inaccessible” pages that were overwritten after the snapshot was taken. Data from them can no longer be read when accessing the volume (without restoring snapshots), and the rewritten blocks are located in active pages at the highest level of storage.
')
In all Data Progression and Data Reduction processes, only “frozen” pages are involved. A regular snapshot is created daily according to a schedule (at this moment all active pages are “frozen”) and the migration between levels begins. Compression always works with blocks of 64K, regardless of page size. After the application of compression algorithms, it turns out that the compressed data together with the required set of metadata takes up more space than the uncompressed data, compression for this page is not applied. Since the system operates with pages, in the final stage of the Data Progression process, part-time pages are combined to free up disk space. SCOS can also recognize and replace with reference blocks filled with zeros, which allows for even higher compression rates.
But using SSD drives and distributing All-Flash systems requires more active use of data optimization technologies to reduce storage costs. Therefore, in the SCOS 7 Data Reduction version, it was also supplemented with deduplication. Since all optimizations occur in parallel with moving data between levels, both Data Compression licenses are required for both compression and deduplication. All Flash arrays with a single storage level are an exception and you can not buy this license. Note that if ssd disks are not used in the array, then neither compression nor deduplication can be used. At first glance, this is rather strange, because both of these technologies work on pages that could be moved from the fastest level of storage to regular disks. That's right - compressed data can be stored on regular disks, but to access it we always need additional access to service metadata. Therefore, SSD disks (at least 6 disks) are also necessary for storing this information, as if we placed metadata on regular disks, there would be a significant performance degradation when accessing compressed data.
Deduplication is supported only on controllers with sufficient processor performance - SC4020, SC7000, SC8000, SC9000. Systems based on SC40 and SCv2000 are not supported. To optimize the data, dedicated processor cores in the controllers are used, so in most cases there is no noticeable effect of background processes on I / O performance. But at any time you can pause Data Reduction processes, if suddenly they really began to affect the speed of the system.
Optimization processes are background and can run either on a schedule or as soon as you take a new snapshot of the volume. After the snapshot is taken, the on-demand data progression process starts and, as a result, compression and deduplication are started (if they are of course included for this volume). There is a serious difference between these two options - daily optimization processes all frozen pages, and the “quick” option is only those that appeared during the creation of the snapshot. As a result, such a background process loads the system less during working hours.
Deduplication, in contrast to compression, already works with blocks of 4KB in size. The implementation principle is standard - as in other systems, the hash from the block is considered and compared with the dictionary. If such a hash in the dictionary already exists, then the data block is replaced by a link, due to which there is a saving of disk space. Then, after deduplication, the compression starts. At this stage, only the remaining unique 4K blocks are compressed. Everything that was said about compression earlier remains valid - in some cases compression may not give the desired effect and then the page will be written to disk “as is”.
For each volume, you can choose which technologies to use - compression, compression paired with deduplication, or disable Data Reduction altogether. It is impossible to enable deduplication without compression. Depending on which storage levels are used, optimization may work differently:
As is usually the case when developing a new solution, it is necessary to remember about some features of the implementation. Replication at the storage level is supported for compressed and deduplicated volumes, but at the time of the actual data transfer, the transferred data is decompressed in the controller’s memory (the data on the disks remain compressed) and only uncompressed data is transferred “outside”. Yes, you can separately enable data de-duplication but this process will work only when sending data to the remote system, and the data itself will be “re-hydrated.” If you want to change the controller that owns the volume with Data Reduction, you need to It is necessary to turn off compression and deduplication beforehand, wait for full recovery (after the next Data Progression cycle) and only then change the owner to another controller.
Yes, there was a time when multi-level storage in the Dell Compellent could be positioned as a new and unique solution. But now, when All-Flash is gaining popularity, you can not leave the system without the possibility of optimizing data - the price per GB becomes too high compared to competitors. The appearance of the new functionality in SCOS cannot but rejoice, and given that the update is supported on already used controllers, customers are able to start using the existing storage systems more optimally. To what extent a periodic deduplication is worse or better than a permanent question is open and for each individual project there will be a different answer. The right way is to test the equipment before purchase and not only listen to the vendors' marketing statements when choosing a solution.
Trinity engineers will be happy to advise you on server virtualization, storage systems, workstations, applications, networks.
Visit the popular
technical forum of Trinity or
order a consultation.
Other Trinity articles can be found on the
Trinity blog and
hub . Subscribe!