📜 ⬆️ ⬇️

Trying on deduplication and compression to backup


Overseas merchants used to say that deduplication can significantly save the space required for storing backups. What is in overseas lands such people who have a year-old backup history fits in the same amount that is occupied on the working servers. It seems like you are copying 10 Terabytes of data every day a whole year, and the same 10 Terabytes are occupied on the backup storage device. Breshut, I guess.

However, there is a good way to check how the data of numerous backup copies specifically of our servers can be packed into a storage with deduplication and compression. At the same time, there is no need to deploy the whole backup system, it is enough to download one small (4 Megabyte) utility that will give us not only a picture of how to reap the data right now, but also build a forecast of how much memory we need in the future.


')
For starters, download the utility from here:

http://downloads.arcserve.com/tools/RPSPlanning/unsupported/DeduplicationPlanningTool/V1.0/ArcserveUDPDataStoreCapacityPlanningTool.zip

Although the archive is small, but the utility is demanding:


If we have everything we need, deploy the archive anywhere and run ArcserveDeduplicationAssessment.exe

Then we add the servers we are interested in to the list of subjects by clicking on the “Add Node” button:



After that, the probe program will be remotely installed on the server we specify, which can be seen in the list of services:



By the way, upon completion of the work with the utility, the probe program will offer to remove:



For now, let's start collecting statistics by clicking on the “Scan Nodes” button.

By the way, how many resources does the collection of statistics consume from the working server?
The documentation provides an example, according to which a server with an i7-4790 processor, 3,601 MHz, 4 cores was loaded by 25–30% within 22 minutes to process data from a 199 GB disk.

By default, the priority of the statistics collection task is set to a low level, yielding processor time to higher priority tasks.

This can be changed if the collection of statistics is too slow.


The screen shows the percentage of work performed on each of the servers studied:



Upon completion of the collection of statistics, go to tab 2 and build a report. It makes sense to tick all dates when statistics were collected. This will allow to see the data in the dynamics:



Now, on tab 3, we can use the obtained data and, having played the parameters, determine the need for backup storage volumes and the configuration of the Arcserve UDP backup storage server.

In the example below, we see the following:


At the output, we find that in order to store 31 backup copies of these machines, we will need 76.85 GB of memory, which means a savings of 94%:

(You can also see what the requirements are for the RAM memory of the Arcserve UDP backup server. In this case, you will need 1.19 GB of RAM or 0.06 GB of RAM in combination with 1.19 GB of space on the SSD disk).



Clicking on “Show Details” will see more details.

If we make only full backups (“Full Always”), then deduplication will reduce their total size (1282.99 GB) by 91% to 118.90 GB.

Compression will reduce this amount by another 35%, that is, to 78.85 GB.



If we use the backup in the “Incremental Forever” mode (only incremental backups after one full backup), the required space for storing backups will not change and will also amount to 78.85 GB. We just have to do less computation for deduplication, and therefore, less working servers will be loaded:



Now look at the bookmark with graphs.

Select the type of graphics “Disk and Memory Usage Trend”.

It is clearly seen that by adding a second to the first backup of 35 Gbytes (also 35 Gbytes), we need 70 Gbytes of memory, as shown on the left blue graph.

However, if we use deduplication, the memory requirements for backups are significantly reduced. Green, orange and purple graphics show us the necessary volumes depending on the level of compression used in conjunction with deduplication.

On the right graph, you can see how the need for RAM (or RAM in combination with an SSD disk) on the Arcserve UDP backup storage server is growing.



If we select the “Disk and Memory Usage” graphics type, we will see how the block size used for deduplication affects the need for memory. It can be seen that increasing the block size somewhat reduces the efficiency of deduplication, but also reduces the requirements for fast memory (RAM or SSD) on the Arcserve UDP backup storage server:



After exiting the program, these statistics are not deleted, even if you remove the probes on the production servers. This data can be used in the future to build graphs that reflect changes in memory needs.

The described utility is included in the distribution package of the Arcserve UDP product, installed with it into the “ ... \ Program Files \ Arcserve \ Unified Data Protection \ Engine \ BIN \ Tools \ RPS Planning ” directory, but it can be downloaded by itself as described above.

The utility is not a supported product, that is, you cannot officially contact technical support. But this is offset by its extraordinary simplicity and free.

You can learn more about Arcserve products by reading our blog and visiting the links in the right column.

Source: https://habr.com/ru/post/303404/


All Articles