At present, the role of information technology in business processes of modern enterprises cannot be overestimated. In this case, the deeper their integration occurs, the more important the cost of the processed data becomes, the more expensive their loss is. Thus, the issue of data protection, archiving and storage is already concerned not only with system administrators, but also with business leaders and business owners.
Key Data Protection Issues
Modern realities in the event of any unforeseen failure (accident) require the minimization of two main parameters: the amount of lost data and recovery time. At the same time, the amount of lost data actually depends directly on the time elapsed from the moment the last state of the system was saved until the time of the accident. Thus, in order to minimize this parameter, it is necessary to back up as often as possible, in turn, increasing the already growing amount of stored data. It is the organization of the backup, at the moment, becomes the main task of the system administrator.
In addition to the avalanche-like growth of data, the windows for their backup are compressed, this is due to an increase in the operating time of enterprises, opening of branches in other time zones, an increase in server load and other similar processes. All this leads to an increase in requirements for network bandwidth and storage device performance, as well as an increase in the number of backup processes performed in parallel.
')
Backing up to tape in such conditions can no longer cope with the increased requirements, and ceases to be a solution to the problem. At the same time, backing up to hard drives, despite the increase in their volume and reduction in the cost of 1MB, is still too expensive. In such a situation, perhaps the only devices capable of solving the problem of backup are deduplication devices.
Deduplication
Deduplication devices retain only unique data, eliminating duplication. Their effectiveness is determined by the fact that a significant (or even large) part of the data is constantly duplicated. Let's look at simple examples.
So, making a daily copy of an image, of a server or a virtual machine, you have to constantly keep the same set of files, plus some changes that have occurred during the day. In most cases, the amount of different data will be insignificant, while copies of images will occupy almost the same place in the file storage. Potentially, deduplication retains the original image, and then instead of the next, only the differences from the original one. The potential for saving storage space for backups in this case increases significantly if you have many of the same type of servers or virtual machines.
Deduplication is also highly effective when backing up databases. Here we are talking not only about the changed data, but also about the presence in the database itself of many identical records.
Generally speaking, even a low deduplication ratio allows you to save a huge amount of disk space. This is clearly seen from the graph below. In practice, in backup tasks, it is rarely lower than 10: 1, which means 90% space saving for backups.

Fujitsu deduplication solutions
Fujitsu solutions use data splitting into blocks of variable length, after which each block receives a unique signature (checksum). At the same time, the blocks are also located inside the files, which helps to find data including in Microsoft Exchange files and databases. Next, a signature sheet is compiled containing the signatures themselves and the place of the blocks with such a signature in the initial data sequence. At the end of the deduplication process directly on the hard drives only unique blocks and a signature list are saved. Those. if a new data block matches any of the previously recorded ones, it will not be recorded, instead, the corresponding pointer is simply added to the existing block. After that, automatic compression of unique data blocks is performed, i.e. they are compressed using standard archiving algorithms, which in turn can further reduce the space required for backup.
The process described is currently online. Those. All these stages are performed on the fly, invisible to the user, which allows you to immediately determine the remaining space, because already deduplicated data is written on the disks. The rejection of off-line deduplication became possible due to the increased performance of modern processors and memory subsystems used in ready-made server solutions.
ETERNUS CS800 Data Backup System
One of the best solutions for the deduplication problem is the Fujitsu ETERNUS CS800. It is a software and hardware complex, i.e. This is a complete solution that does not require the purchase of additional licenses, components or software. The ETERNUS CS800 system is focused on the use of small and medium-sized businesses (SMB), as well as on the installation in the regional offices of large companies. The capabilities of this solution make it possible to use it instead of tape libraries, as well as with them.
ETERNUS CS800 is presented in two models: Entry and Scale. Entry has two modifications with a 4.8 TB or 9.6 TB storage capacity. At the same time, I would like to draw your attention to the fact that in this case we are talking about the amount of data that is actually available to the user - the RAID overhead in this case is already taken into account. Fujitsu adheres to this strategy with a user-accessible volume for the entire ETERNUS CS line. The Scale model is an extensible solution (by connecting additional disk storage systems). At the same time, depending on the hard drives used, the following options are available: 8-80 TB (when using single-terabyte disks), 16-160 TB (with double-byte) and 24-240 TB (with three-terabyte disks). Hardware ETERNUS CS800 consists of a Fujitsu PRIMERGY rack-mount dual-bin server, and the Scale models additionally have a Fujitsu ETERNUS DX 80 storage system with attached disk shelves. The number of shelves can be from 1 to 10 and dynamically increased during operation of the system. Storage systems use RAID 6, which guarantees data integrity in case of failure of up to 2 disks from each (disk group) of the array.

ETERNUS CS800 can be accessed via CIFS / NFS interfaces via Ethernet, VTL via Fiber Channel or Symantec OST. Thus, setting up and integrating into an existing enterprise environment should not be a problem.
Another feature of ETERNUS CS800 is the support of the Path To Tape (PTT) function, thanks to which it is possible to connect physical tape libraries to it and dump data for them for long-term storage, but not for their operational use, for example, annual or monthly copies. In this case, the data on the tape will be dropped non-duplicated data, which will allow you to read them bypassing the deduplication device.
ETERNUS CS High End Data Backup System
Despite its merits, ETERNUS CS800 has some limitations. So, its scalability is limited only by the increase in disk space of the storage system (in the Scale model), while the scalability of deduplication processors is simply impossible here. For businesses and organizations where the performance of a single processor module is low, Fujitsu offers an ETERNUS CS High End deduplication device.
This ready-made solution can flexibly grow with your company up to 10 processor modules managing incoming data streams, 16 internal RAID systems (3.6 PB of data). Such a system is capable of processing up to 400 TB of data per day. In addition, this system supports up to 10 devices to which data can be reset, for example, physical tape libraries, thereby integrating tape libraries into a common backup and archiving system. Such a system will be a solution with high resiliency, because there is no single point of failure in it, the system will continue to function until at least one of the deduplication processors is operational.
It is not necessary that the ETERNUS CS High End system will be very expensive. The cost of a configuration with a single compute node does not much exceed the cost of lower-class systems, but in your hands there will be almost unlimited possibilities for further expansion of the system depending on the business needs.