📜 ⬆️ ⬇️

Bench press: compare HPE StoreOnce and EMC Data Domain

This article presents the results of a comparison of two storage solutions for backup, which were tested by me and colleagues from Onlanta when choosing options for upgrading the cloud backup system OnCloud.ru .

Keeping backups is an integral part of the backup process. Traditionally, two storage levels were used for this: discs + tapes. But customers in the construction of new infrastructures and updating the old ones refuse to use tape libraries in favor of disk systems only. This is due not so much to lowering the cost of disks, but to increasing the efficiency of storing virtual infrastructures due to deduplication and compression, integrating the backup process of virtual environments with storage systems that host virtual environments, as well as the possibility of almost instant recovery of virtual machines directly from backup lying on a disk library.

Backup storage is a software and hardware complex. The main feature of data arrays - deduplication and data compression. At the same time, we have the opportunity to store more backup copies on disks, increasing the depth of storage on faster media than tapes. We are able to completely abandon tape media in favor of disks, except when alienable media is required.

Also a very important feature is that we have the opportunity to purchase separately the software part of this complex as a virtual system (Virtual Server Appliance (VSA) from HPE and Virtual Edition (VE) from Dell / EMC), which you can deploy in your own virtual infrastructure and use the existing storage or any other of your choice. By the way, the use of a virtual system does not impose any restrictions on the used storage system and is not at all important how it is connected to it: by FC or iSCSI. Traditionally for these tasks use inexpensive NL-SAS disks. They are quite voluminous and cheap for these purposes, and in addition, they provide high speed on consecutive write / read operations, which characterize the backup process.
')
Our goal was to investigate solutions in terms of data compression and deduplication capabilities. We did not set the task to test performance. In this case, testing was conducted on productive virtual machines Linux and Windows, which are located in our cloud.

But let's dive a bit into history. In 2001, Data Domain was created, which began creating disk storage, which would have data compression capabilities and, at the same time, outperform tape libraries. The product was so interesting and of high quality that leading vendors like NetApp and EMC wanted Data Domain to be purchased. As a result, in 2009, the rate of EMC was higher, and it acquired the company Data Domain. In the future, EMC fully integrated Data Domain into its Data Protection Suite platform.

HPE went a different way and in 2010 presented its own approach to deduplication at the HP Technology Forum conference. The technology was developed by HPE Labs and is named StoreOnce. It was used in the arrays of that time - StorageWorks D2D, which previously used third-party software. In 2012, the StorageWorks D2D line was replaced with the line with the same name, StoreOnce, under which it still exists today.

By the way, in the market of disk libraries there is also the company Quantum, which has a product line DXi-V . It exists only in the form of software and hardware complex, and therefore does not participate in our today's comparison. There is also the ExaGrid company, which since 2002 has been developing similar solutions with its own protocol, but is practically not represented on the Russian market.

According to IDC, at the end of 2016, Dell / EMC occupied about 60% of the global backup systems market, which is several times higher than the main competitors.


A source

The main results of our work compared the functionality of virtual systems from Dell / EMC and HPE are presented in the table.


I would like to immediately note that a larger maximum amount of a virtual system from Dell / EMC is a plus. Of course, you can purchase and deploy several of them, but deduplication will work only within one virtual system. Therefore, its larger volume is an undoubted advantage in this case. This will save space even more efficiently.

If you have not very large amounts of backup data or you want to deploy a virtual system as a test bench, then the solution from HPE is more interesting, since its free volume is 1Tb and can be used in production environments under the terms of the license agreement (but without the possibility of updating it). In turn, Dell / EMC provides only 500Gb of free volume (in all cases we are talking about usable volume) and cannot be used in productive commercial environments. A 50Tb version from HPE is available for testing for 60 days.

Also, these systems allow you to organize multi-level data storage by moving backups to the cloud. Cloud Tier from Dell / EMC allows you to expand the volume of the system several times, while supporting encryption. It can work with cloud services from Dell / EMC itself, as well as with Amazon S3 and Azure Storage. For more information, see the Dell / EMC video .

HPE’s similar functionality is called HPE StoreOnce CloudBank and works with Azure and Amazon clouds. I recommend to get acquainted with their video .

If you plan to use a hardware solution from HPE or Dell / EMC, the vendor highly recommends using a virtual system for test purposes, in order to understand what level of deduplication / compression you can get on your productive data. This is necessary both for understanding the effectiveness of the solution in your environment as a whole, and for more accurate sizing of the system.

The main killer feature of these solutions is the presence of a data transfer protocol that allows you to deduplicate the data on the backup server and thereby transfer only unique data blocks.


This allows you to increase the overall backup speed, because non-unique blocks are simply not passed to the array. This feature requires a separate license from both HPE and Dell / EMC.

We deployed on our test bench Data Domain Virtual Edition and HPE StoreOnce Virtual Storage Appliance . As datastores for these virtual systems, we served lun (moons) on the IBM Storwize V7000 array with NL-SAS disks connected to the hypervisor host via the Fiber Channel protocol. Our virtual infrastructure is built on the basis of VMware, the virtual systems communicating with the nodes of our cluster through the LAN.

I note once again that in this case we did not set ourselves the goal of getting maximum performance. We were interested in the level of saving disk space due to deduplication and compression, because our test environment is not quite suitable for finding the maximum performance of the backup system.

We will compare among themselves three options:

  1. Veeam backup software + DataDomain storage using DDBoost protocol,
  2. Veeam + Storage StoreOnce backup software using Catalyst protocol,
  3. Veeam with its own built-in deduplication.

Why was Veeam chosen as backup software? I have been working with this product for about three years: the company “Onlanta” has the partner status of Veeam. Also, this software is extremely easy to deploy, and without any additional tuning it is capable of delivering the maximum speed, as we have already seen from our own experience. And the quality of the work of the support service Veeam has always been on top.


8 Gb / s SAN backup speed

In general, the third option is added more to demonstrate the effectiveness of deduplicating storage media in principle. Nobody hopes that Veeam will be at the level of HPE or Dell / EMC. But it was interesting for me to compare different levels of compression, which are only five in Veeam, ranging from a complete shutdown and ending with Extreme. It should also be noted that Veeam performs deduplication in a single session, which is naturally less effective. We performed a full backup, then several days incremental backups and then full again.

Test: Databases


We took several productive virtual machines with different databases on Windows and Linux. The total amount of virtual machines was 2.5 TB.


Test: Terminal Servers


Here the situation is similar to the previous test. Only all servers were respectively on the Windows platform. The total VM size was 2.2 TB.


As expected, the efficiency of the virtual system was higher than that of the Veeam. 1: 5 is quite a good indicator, especially since it is fairly stable on compressible data types. Naturally, with a larger backup storage depth, this figure will grow even higher.

More from the nice


In addition to the fact that both of these products can work with traditional backup systems, they can also interact via plug-ins and with some systems directly. These include:


This is convenient, economical, and eliminates unnecessary software involved in the backup process.

fault tolerance


During testing, we checked the systems including for failure in case of loss of the storage on which they were located. As a result, both VMs “fell” and did not rise after reboot. Most likely, the failure is associated with the loss of the Write-back array cache due to its hot shutdown.

Screenshots of errors after reboot

DataDomain


StoreOnce


We have three solutions for failover.

  1. The most expensive is the transition to a hardware implementation, which is devoid of the problem of data loss / inconsistency in the event of a power failure. The disks are located directly in the system, and not implemented by external solutions.

  2. The easiest is VM snapshots or backups. Everything is simple - the virtual machine broke, rolled back, continued to work. But in this case, we lose information about backups made since the last backup of the virtual system itself.

  3. Replication to the second array. Ie, we deploy two virtual copies of the system, which will live on different storages, we perform replication between them. There may be a “hybrid” version and one of the systems involved in replication may be physical. Thus, it is possible to organize centralization of reliable storage of backup copies of branches in a reliable data center.

In my opinion, replication is the best way to protect data. They can and should be distributed not just for different computing clusters and different storage systems, but also for different sites. As a result, we will have three copies of the data, which will allow to recover in case of any problems.

In conclusion, it is worth emphasizing that I deliberately do not draw conclusions, which solution is better, which one to choose. The products are very similar, and their research yielded very close end results. It is worth starting from the final price of products, from those arrays that you already have in the infrastructure, from backup software that you already use (not all products support both protocols) and from personal preferences or familiarity with the vendor's equipment.

I hope today's review has helped you become better acquainted with this type of device and understand their need for your infrastructure.

Source: https://habr.com/ru/post/330856/


All Articles